Debugging 201: Debugging memory faults in C++ (and C) programs

Memory faults can appear in many ways. The most direct is to see the message "Segmentation fault", but you can also see memory errors in odd error messages from library routines, programs that just stop unexpectedly, or even programs which apparently fall into infinite loops. The commonality is that you have accessed memory in the wrong way, and the program starts exhibiting random behavior. The best place to debug a memory fault is on the machine having the error since it is almost always important to reproduce the problem before trying to fix it.

A key thing to know is that memory faults can appear different ways on different computers. Your program might run fine on one computer just to have it crash without output on another. That's because every system's memory model is different, the machine instructions are different, and you might even have slightly different versions of libraries. A simple off-by-one error might cause no errors on one machine but overwrite a function's return address in another and result in executing whole libraries of code that shouldn't have been. All bets are off with memory errors.

Sometimes the message will be "segmentation fault (core dumped)". "Segmentation" refers to memory being divided into regions and only certain numbers being valid memory addresses. A segmentation fault is just a generic way of saying the program accessed the wrong portion of memory in the wrong way. If you see a "core dumped" message, then this just means that a copy of memory was written to the disk. In theory you could load that memory into a debugger in order to figure out what went wrong, but this very difficult to do and to use as a debugging tool. It is easier to just re-run the code for programming assignments.

Your first task is to determine what line of code was last executed; that is, where the error occurs. Between pipelined architecture and delayed fault detection (both important to high performance), the system cannot tell you what line was last executed successfully. Do not rely on stepping through code to find the error. Stepping works great when you are very close to the region of code with an error, but stepping through hundreds of statements to find a bug takes forever and often results in accidentally skipping over the line that contains the error.

If you are running your code in an integrated development environment (IDE), you are in luck: set breakpoints at key places in your code and see how far things go before the error happens. For example, set breakpoints just before reading any data, after reading the data, after computing results, and just before printing them. Then run in debugging mode and examine key data at each point to make sure things are going as expected. If the code gets to a breakpoint on line A but not on line B, add more breakpoints between A and B and rerun. Once you have it narrowed down to a few key lines, you can try stepping through.

If you are in an environment where you cannot set breakpoints -- such as esubmit -- use output messages to locate the fault. But standard output is not useful in these cases because standard output is buffered and so there could be pending output that does not get written when the program crashes. The solution is to send debugging output to standard error. For example, add statements like

        cerr << "At point A" << endl;
        ...
        cerr << "At point B" << endl;

If you are using an IDE, you will want to store your input in a file and use Run Configurations features to read your input from that file. Retyping input over and over is too error-prone, and if you change your input you might exercise a different error instead. Focusing on one error at a time is critical.

Minimizing your input is another key debugging trick. If you have a lot of data, determining if parts of your program are working is more difficult. So reduce your data using the following algorithm:

  1. Find an input that makes your program fail. You are probably reading this because you are already there!
  2. Cut that input in half and re-run the program.
  3. If the program still fails, go back and cut your new input in half.
  4. If the program now works, restore half of the data you removed and re-run the program.
With a few quick runs you can often get your input down to exactly the lines that exercise the error. Quite often that will give you important clues as to what went wrong, but if it does not, then you at least have a smaller input to use for debugging. Often the input will now be small enough that you can step through your program, and quite frequently just knowing what input exercises the fault will help you determine the fault.

Key tip: first find what fails, then figure out why. Determining why from just seeing a fault requires experience. This is why we instructors sometimes can tell what is wrong just by hearing about the error! It's not magic, it's just that we have gone through the same thing you are going through and have learned from our own mistakes! As you develop experience debugging code, you will get much better at it. This is one of the more important things you will learn in this course!