Undefined behavior in C and C++ programs

Introduction

When a C or C++ program triggers undefined behavior, anything is allowed to happen in the program execution. And by anything, I really mean anything: The program can crash with an error message, it can silently corrupt data, it can morph into a colorful video game, or it can even give the right result.

If you’re lucky, the program triggering UB will show an appropriate error message and/or crash, making you immediately aware that something went wrong. If you’re unlucky, the program will quietly mangle data, and by the time you notice the problem (via effects such as crashes or incorrect output) the root cause has been buried in the past execution history. And if you’re very unlucky, the program will do exactly what you hoped it should do, until you change some unrelated code / compiler versions / compiler vendors / operating systems / hardware platforms – and then a new bug becomes visible, and you have no clue why seemingly correct code now fails to work properly.

The problems of undefined behavior in C/C++ programs come from three sources: the definitions of UB in the respective C and C++ language standards, the difference in the treatment of UB by different compiler implementations, and the sloppiness of programmers unaware of the consequences of UB.

I’m ashamed to admit that I spent the first ~7 years (until ~2012) of my C/C++ programming career writing code without knowing the concept of undefined behavior. The notion of UB was poorly taught, if even mentioned at all, in the books I read, the university courses I took, and the web pages I encountered. My ignorance of UB made it much harder for me to debug coworkers’ C++ code – I had a hard time orienting myself with where to look for faults, and spent too much time tracing function calls and data flow before I could locate the root cause. After I developed a sense for UB, it became much easier for me to hypothesize the cause of a bug and drill down to where the most likely problem spots were. This initial ignorance and painful learning process are why I truly wish I had known about UB from the very start when programming in C/C++.

There is no easy solution to the problem of undefined behavior in C and C++. The best approach I can see so far is multi-faceted, involving developer awareness/education, compiler warnings/diagnostics, runtime checking, and manually updating old code with new knowledge. It takes considerable skill and effort to fight UB, but it’s also never too late to start learning, and will reduce pain down the road.

Introduction
Examples of undefined behavior
Examples of consequences
Rationale for UB definitions and effects
Debugging stories from personal experience
Behavior in other languages
What you can do about UB
More info

Examples of undefined behavior

Reading an uninitialized local variable is usually UB:

int x;
printf("%d", x);  // UB happens here

Arithmetic overflow in a signed integer type is always UB:

int x = INT_MAX;
x++;  // Boom! (usually silent)

Dereferencing a null pointer is UB:

int *p = NULL;
*p = 123;  // Kaboom

Reading/writing an index past the end of an array is UB:

char a[10];
printf("%d", a[10]);  // Bang
a[-1] = 7;  // Oww!

Shifting more than the integer width or less than zero is UB:

uint32_t x = 0;
x = x << 33;  // Blamo
x = x >> (-1);  // Kapow

Computing an out-of-bounds pointer is UB, even without dereferencing it:

short a[10];
short *p = &a[15];  // Bleep

Comparing pointers from unrelated objects is UB in C:

long *p = malloc(size(long));
long *q = malloc(size(long));
if (p > q) ...  // Ouch

Examples of consequences

Some forms of UB can be detected at compile time (e.g. reading an uninitialized variable), and the compiler can show a warning or even stop the compilation – but the compiler can choose to silently march on. Most forms of UB result in unintuitive behaviors that seem unrelated to the root cause, as we will see below.

Read of uninitialized variable

bool x;
if (x == true ) puts("Hello");
if (x == false) puts("Goodbye");

Naively, we might think that the value of x is either true or false, so either “Hello” or “Goodbye” is printed. But UB destroys all intuitive expectations, and x doesn’t need to behave like a single unknown value. The compiler can stop caring about the value of x and assume both if-conditions are fulfilled, then print “HelloGoodbye”. The program can also print “42! Preparing to format hard disk”. It can behave differently on each run, which makes sense if you assume that the contents of memory begin in a random state each time.

Signed integer overflow

bool testOverflow(int x) {
    return x > (x + 1);
}

Every developer “knows” that modern computers use two’s complement format for storing signed integers, and we all “know” that INT_MAX + 1 == INT_MIN due to arithmetic overflow. Both of these points are in fact guaranteed to be true in languages such as Java, but C and C++ guarantee neither of these points. Signed integers in C/C++ can be represented in sign-magnitude, ones’ complement, or two’s complement, to allow the generation of efficient machine code that matches the native signed arithmetic capabilities of older CPUs. And signed overflow means that the compiler or runtime is allowed to do anything. A C/C++ compiler is legally entitled to transform this function into effectively:

bool testOverflow(int x) {
    return false;
}

Why is this the case? Suppose the computation of x + 1 does not overflow, i.e. INT_MIN <= x + 1 && x + 1 <= INT_MAX. Then of course x > x + 1 is false. Now suppose x + 1 does overflow (which happens if and only if x == INT_MAX). Then undefined behavior is triggered, and the compiler can do literally anything. For convenience, the compiler chooses to treat this case the same as the well-defined-behavior case, so we can just return false. However, note that if we use the non-standard -fwrapv flag to force signed integer overflow to wrap around consistently (or use a language like C# where signed integer overflow is mandated to wrap around), then the case of testOverflow(INT_MAX) will indeed return true.

Null pointer check

Suppose we have a piece of code like this:

void f(int *p) {
    printf("%d", *p);
    if (p != NULL)
        printf("OK");
}

Then the compiler is permitted to substitute this equivalent code with the null check removed:

void f(int *p) {
    printf("%d", *p);
    printf("OK");
}

Why? Suppose p is not NULL. Then the first print will be fine, and the second print will get executed. Now suppose p is NULL. Then the first print triggers UB, so everything thereafter is meaningless. For convenience we can make this behave the same as the first case, where both prints get executed.

Division by zero

On popular CPUs, the machine instruction for division will raise an interrupt/exception/trap if the divisor is zero. Usually the operating system will catch the exception and immediately terminate the program, but other behaviors are possible as well. Due to the varying behavior of the execution environment, the C/C++ standards bodies decided to treat division by zero as undefined behavior – once it happens, the program could abruptly terminate, keep running with dubious values, or anything else. Look at this code for example:

unsigned x = (...);
unsigned y = (...);
unsigned z = x / y;
printf("Division OK");

You may be surprised to learn that even if y == 0, this program is allowed to print “Division OK” and then crash – despite the fact that the print statement is placed after the potentially offending division operation. The reasoning goes like this: If the division was actually okay (y != 0), then the act of computing the division had no externally visible effect, so it’s okay to move the print statement upward by one line. Otherwise if division by zero occurred, then it’s UB and anything is allowed in the entire program execution – which includes moving and executing later statements to be in front of the UB-triggered crash.

Out-of-bounds access

Consider this piece of faulty code:

double a[5] = {0.0};
for (int i = 0; i < 8; i++)
    printf("%f\n", a[i]);

The compiler can legitimately replace the code with this modified version:

double a[5] = {0.0};
for (int i = 0; true; i++)
    printf("%f\n", a[i]);

Why is this so? We know that in each iteration of the for-loop, the test i < 8 occurs before the print. If i < 5, then i < 8 is certainly true. When i == 5 (which will definitely happen because the loop doesn’t break out early), the print of a[i] will access an element that is out of bounds, so now we can do anything we want. In particular, we can choose to elide the i < 8 test for future loop iterations and just pretend that it always evaluates to true.

If you’re lucky, your program will crash not long after accessing an out-of-bounds array element. If you’re unlucky and a large chunk of memory after array a has been allocated for other objects, then the program will print tons of values before it crashes (if ever).

Rationale for UB definitions and effects

From the language designer’s point of view, undefined behavior is a way to account for significant differences between compilers and between platforms (OS + CPU). Let’s say CPU A evaluates ((uint32_t)1) << 32 as 1, and CPU B evaluates ((uint32_t)1) << 32 as 0. Literally the easiest way to handle this situation is to throw one’s hands up and say, “I can’t deal with these behaviors. Let the compilers and CPUs do whatever they want, and let the coders sort out the mess”. This approach avoids having to prescribe which behaviors are legal or not, how to detect which variant of behavior is being used, etc. Another reason might be that historically, two compiler vendors have interpreted a piece of code in crucially different ways. Let’s say that for the awkward statement a[i++] = i++;, compiler vendor D chose to evaluate it as the equivalent of j = i; i++; k = i; i++; a[j] = k;, whereas compiler vendor E chose to treat it as a[i] = i; i++; i++;. These interpretations have visibly divergent outcomes, and it’s easy just to prescribe a rule like “if you modify a particular variable more than once within a single statement, then it’s UB”, instead of asking compiler vendors to change their behaviors to conform to some new rule.

Where things get interesting is that once the programming language specifies certain situations as undefined behaviors, the compiler writers can get really creative with abusing the freedom being offered. The examples given above illustrate why taking advantage of UB is so lucrative – it can be used to simplify, shorten, and speed up the compiled code without violating any rules, technically speaking. Problems can arise when a programmer triggers UB accidentally (e.g. accessing an index out of an array’s bounds) and an extensive mess is created. And problems can arise when a well-meaning programmer writes some defensive logic that happens to trigger UB (such as the ill-conceived overflow test above) and the compiler performs a transformation that elides the intended behavior.

Debugging stories from personal experience

“Works fine on Windows, crashes on Linux”

Our team experienced a crash in a particular function, and I was tasked with debugging it. The code looked like this:

void foo(char *s) {
    int n;
    sscanf(s, "%ld", &n);
    (... a hundred lines of subsequent logic ...)
}

The program behaved fine when compiled and run on Windows, and also on the Linux systems we had utilized before this incident. The Linux system where this program crashed presumably had a different compiler version, and I was in no position to dictate what compiler they should use.

Debugging this code was one of the worst experiences in my programming career. I tried my usual bag of tricks – putting print statements in key places to show where the control flow reaches, printing out values of important variables to get a picture of the program state, and deleting unrelated code to try to narrow down the offending code. These strategies failed in a spectacular way: Just by moving a print statement up or down one line, I could change the program’s runtime behavior between crashing and not crashing. The prints showing values and control flow did not yield a consistent picture of why the program was able to reach a certain point in the code. I didn’t understand undefined behavior at this time, hence the observed program behaviors left me confused and made me chase red herrings for hours.

The crux of the problem was that the format specifier %ld made sscanf() try to write to a long variable, but we were giving it a piece of storage that had the size of an int. On the Windows platform, int and long are conventionally defined as 32 bits, so the bug stayed latent. But on Linux (even 32-bit), int is 32-bit but long is 64-bit, so sscanf() tried to write a 64-bit value to a 32-bit variable. The variable was stored on the stack, so this corrupted whatever value in stack memory that came immediately after the int variable. When the next value happened to be the function call’s return address, this is what triggered the visible crash. But depending on what other logic was in the function, the variable n could have been arranged in different places on the stack, which is probably what caused the inconsistent effects.

Actually this story was a simplification of what happened. If the problem was the incorrect usage of sscanf(), we would have caught it way earlier and more easily. Because as a C standard library function, modern compilers like GCC can check that the intent of the format string matches the types of the actual variables passed in. The reality was that we were using PyArg_ParseTuple(), which had the same design principles as sscanf() (using format string and varargs) but didn’t have any extra safety nets. Although at the time I knew how sscanf() worked, I didn’t know the exact semantics of PyArg_ParseTuple() and its format string. Only by consulting the official Python documentation did I realize that the consequence of our particular format string was that it was trying to write a long value, which meant we were obliged to supply a long variable to it as the storage target.

“I added some prints and now the program crashes”

A coworker came to me with an innocent-sounding problem: He added a small number of fopen(), fprintf(), and fclose() statements to a program, then it started crashing. Before his additions, the program exhibited no signs of problems; furthermore I glanced at the diff and was convinced that the bits of code he added were using the functions and following the rules of C correctly. Luckily when this happened, I already understood the concept of undefined behavior. I was able to respond to him within minutes (without attempting to debug the code) that his work was fine but there was a pre-existing problem lying literally anywhere else in this ~10000-line program.

The only appropriate tool I knew at the time was Valgrind (nowadays Clang’s -fsanitize=address is much better in every way). So I ran the binary executable through Valgrind, hoping to detect the reading of uninitialized memory and dereferencing of stray pointers. (It didn’t matter whether I ran the original version of the program or the version with his fprintf() additions, because he didn’t add any new problems.) Unfortunately a third-party library we were using was generating hundreds of false warnings of uninitialized reads. But by a miracle, I spotted a critical line in the message log that said the program was writing two elements past the end of an array. The message did include the name of the source code file and the line number; I jumped to the location in my text editor and immediately confirmed the cause of the problem (because the array size and loop bounds were hardcoded).

“Let’s experiment with horrible code containing multiple assignments”

Back in high school (about 8 years before I understood UB), someone showed me a piece of C code essentially like i = i++ + i++;, and challenged me to figure out its effect on the value of i. I made a guess, compiled and ran the code, and was surprised by the result. Then I instrumented the code with prints, like i = print(i++) + print(i++); print(i);, and it gave a different result. I was baffled with the behavior and didn’t know what was going on. In particular I didn’t know of the concept of sequence points and the UB caused by reading and writing a variable within a statement. Note however, that this kind of expression is well-defined in languages like JavaScript, and is guaranteed to be interpreted in one way.

Behavior in other languages

C++

The discussion on this page is primarily about undefined behavior that occurs when coding in the C programming language. However, C++ contains all the core functionality of C, and chose to inherit all the undefined behavior as-is; furthermore it added hoards of new types of UB as well. It is important to note though, that C and C++ are separate languages each with their own specifications, and that occasionally the same piece of code has subtly different meanings in C versus C++. But for the purpose of discussing UB, C++ is effectively a superset of C.

Java and C#

All the common undefined behaviors in C and C++ are well-defined in Java and are guaranteed by the language and virtual machine specifications to behave consistently. Most C-style UB become runtime exceptions in Java, which are thrown at exactly the point where the problem happens, not earlier or later, and also come with a helpful stack trace. Also, the basic language behaviors in C# are a facsimile of Java.

Reading an uninitialized local variable is a compile-time error, which halts the binary code from being generated.
When an array is created or when an object is instantiated from a class, all the elements/fields are set to 0 or null. (This contrasts with C/C++ where malloc() returns a chunk of uninitialized memory.)
Integer overflow is guaranteed to have two’s complement wraparound behavior. Shifting by large or negative values is well-defined, reducing the shift modulo the bit width.
Reading/writing past the end of an array will throw ArrayIndexOutOfBoundsException.
Accessing fields or calling methods on a null reference will throw NullPointerException, without fail, every time.
Integer division by zero will throw ArithmeticException; floating-point division by zero will yield float or double’s infinity.
Ugly multiple-assignment expressions like a[++x] ^= x *= a[x++]; are prescribed to have exactly one valid interpretation.

Python

Python takes a similar approach as Java when encountering C-style undefined behavior; it usually raises an exception at exactly the point where the problem occurred. However, it also makes some forms of UB impossible by design.

Reading an uninitialized local variable will raise UnboundLocalError.
Accessing an out-of-bounds list index will raise IndexError.
Integer overflow cannot happen because only bigint is supported.
Calling methods on None will raise AttributeError.
Integer and floating-point division by zero will raise ZeroDivisionError.
Confusing multiple-assignment expressions like a += b -= a are forbidden at the grammar level (though a = b = a is allowed).

JavaScript

JavaScript isn’t particularly good about telling the developer that an error occurred; instead it tries to mask errors in semi-reasonable (semi-unreasonable) ways. At the very least, the language standard guarantees that these abnormal conditions will be treated consistently.

Reading an uninitialized variable yields the value undefined (which can be thought of as the sibling of null).
Reading past the end of an array also yields undefined.
Writing past the end of an array will extend the array and set the value at the new index.
There is no such thing as integer overflow because all math is conducted using double-precision floating-point numbers. (Note that floating-point overflow saturates to infinity in all languages.)
Division by zero returns Infinity.

Assembly

Surprisingly, assembly language can be considered safer than C/C++ because the semantics of data and instructions are much more nailed down. Although the act of writing asm code is difficult and error-prone, after a piece of code is written down, its pretty much has only one meaning (or a very narrow set of meanings). Assembly code for a particular instruction set architecture or family of CPUs will behave in well-defined ways for events like integer overflow, division by zero, dereferencing null or stray pointers, reading uninitialized memory, etc. Also, integer type widths are fixed in assembly language (but implementation-defined in C/C++), endianness is well-defined, and type punning is absolutely safe.

It would be a simplification to say that a program in assembly language has only one meaning. The mode bits on a CPU can materially affect how a program behaves. On x86, significant mode bits include the state of the direction flag (for string copy instructions), the default operand size (16 or 32 bits), and long mode (64-bit operation). However, the effects of these mode bits are well-defined and have much less scope than what C/C++ compilers can do with undefined behavior.

General discussion

It is jarring when a developer learns programming from a stricter language and subsequently works on C/C++ code. This was my case, because I first learned programming in Java and adopted C and C++ years later. Almost all languages are stricter than C/C++, and you learn to expect the compiler or runtime to indicate errors to you, or at least to behave consistently (e.g. integer overflow). This is not the case with C/C++ – the compiler and runtime are not obliged to tell you that anything went wrong (e.g. you can merrily print uninitialized variables), and when UB is invoked your program does not have to behave the same way on each run or compile. Becoming aware of the forms that undefined behavior can take, and keeping these pessimistic scenarios in mind when reading/writing code, both require a real leap in personal development.

What you can do about UB

The first step to effectively dealing with undefined behavior in C/C++ is to be aware that UB exists, and know a set of common pitfalls. When faced with a debugging problem, being aware that UB is a potential root cause can save hours of work in investigating its potentially misleading and inconsistent effects. When reading or writing code, knowledge of UB helps you imagine what can go wrong, and will lead you to write code that deliberately or explicitly avoids UB (such as adding well-designed checks before a potential integer overflow). Another important part of developer education is to be able to appeal to the relevant sections of the C and C++ standards as the source of truth regarding what constructs are/aren’t considered UB. Finally, when an operation looks suspicious (e.g. is it okay to pass a null pointer into memcpy()?), it never hurts to search the web or Stack Overflow to check if it’s known to be UB or not.

Tools like compilers are very important in the never-ending war against UB. To start, never blame the compiler for breaking your code – if your program worked previously, then you changed compiler versions/vendors and now your program broke, don’t blame it on the compiler. Compilers are designed by smart people, and they (both the compilers and developers) are very careful to preserve your program’s behavior as long as it follows the rules of C or C++. If you noticed something broke, there is a 99% chance that it is a fault in your own code – be humble about it (though compiler bugs occasionally appear). With that out of the way, modern compilers have come a long way from a decade or two ago, and have excellent diagnostics at compile time and run time to catch most types of undefined behavior. For example, C/C++ compilers can now detect and warn you on reading an uninitialized local variable (by comparison, this is a mandatory compile-time error since Java 1.0). The PVS-Studio static analyzer will tell you if you perform a null check after already dereferencing the pointer. For example, Clang’s -fsanitize=undefined flag will precisely catch a slew of integer-related UB at runtime, such as printing a warning every time an overflow occurs, with no false positives or negatives. And Clang’s -fsanitize=address will precisely detect all sorts of memory/pointer problems, such as out-of-bounds reads/writes, use-after-free, double-free, and memory leaks (not UB). (Valgrind offered most of the same functionality as -fsanitize=address more than a decade earlier, but is much slower and less precise than Clang.)

Although this article focuses on undefined behavior, there are behaviors in C and C++ that are implementation-defined, such as the width of integer types (e.g. int being 16-bit, 32-bit, 36-bit, 64-bit, etc.). These variations in behavior are less common and much less severe than undefined behavior, but still need their own attention. At least it’s better to mis-assume that something is undefined behavior and anything could happen, than to mis-assume that something is implementation-defined behavior with consistent and repeatable outcomes.

Becoming a polyglot programmer might be a good strategy for dealing with undefined behavior in C/C++. This is especially helpful for designing non-trivial algorithms and data structures (but not helpful for writing boring straight-line business logic). In the early days before I developed a sense of UB in C/C++ or had access to powerful tools like -fsanitize=undefined,address, I realized I could first design algorithms (e.g. heapsort) in Java, make sure it had no compile errors, run it through extensive test cases, and ensure it had no null pointer or index-out-of-bounds errors. Once I was satisfied with the behavior of the Java program, I would translate the code line by line into C or C++, fix up some trivial syntax errors, and be quite confident that the code worked correctly.

More info

I learned a lot about undefined behavior in C/C++ from reading articles written by others, especially those by John Regehr (Embedded in Academia blog). This article of mine only provides a cursory treatment of the subject, and I recommend you to visit these resources published by people with far more experience and insight than I have:

Project Nayuki