Undefined behavior and truth undefined

The term “indefinite behavior” in C and C ++ refers to a situation in which literally “there isn’t anything”. Historically, cases where former compilers for C (and architectures on it) behaved inconsistently were attributed to indefinite behavior, and the standard development committee, in its infinite wisdom, decided not to decide anything about this (i.e. not to give preference to one of the competing implementations). Uncertain behavior was also called possible situations in which the standard, usually so exhaustive, did not prescribe any particular behavior. This term has a third meaning, which in our time is becoming more and more relevant: indefinite behavior is a possibility for optimization. And C and C ++ developers love optimization; they strongly demand that compilers make every effort to speed up the work of the code.

This article was first published on the Cryptography Services website. The translation is published with the permission of the author Thomas Pornin.

Here is a classic example:

void foo(double *src, int *dst) { int i; for (i = 0; i < 4; i ++) { dst[i] = (int)src[i]; } }

Compile this GCC code on a 64-bit x86 platform for Linux (I am working on the latest version of Ubuntu 18.04, version of GCC - 7.3.0). We enable full optimization, and then we look at the assembler listing, for which we use the "-W -Wall -O9 -S " keys (the " -O9 " argument sets the maximum level of GCC optimization, which is equivalent to " -O3 " in practice, although in some forks GCC defined and higher levels). We get the following result:

  .file "zap.c" .text .p2align 4,,15 .globl foo .type foo, @function foo: .LFB0: .cfi_startproc movupd (%rdi), %xmm0 movupd 16(%rdi), %xmm1 cvttpd2dq %xmm0, %xmm0 cvttpd2dq %xmm1, %xmm1 punpcklqdq %xmm1, %xmm0 movups %xmm0, (%rsi) ret .cfi_endproc .LFE0: .size foo, .-foo .ident "GCC: (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0" .section .note.GNU-stack,"",@progbits

Each of the first two instructions, movupd, moves two double values to the 128-bit SSE2 register ( double has a size of 64 bits, so the SSE2 register can store two double values). In other words, four initial values are read first, and only then they are cast to an int ( cvttpd2dq operations). The punpcklqdq operation moves the four received 32-bit integer values into one SSE2 register (% xmm0 ), the contents of which is then written to RAM ( movups ). And now the main thing: our C-program formally requires that the memory access take place in the following order:

Read the first double value from src [0] .
Write the first int value to dst [0] .
Read the second double value from src [1] .
Write the second int value in dst [1] .
Read the third double value from src [2] .
Write the third value of type int in dst [2] .
Read the fourth value of type double from src [3] .
Write the fourth value of type int to dst [3] .

However, all these requirements make sense only in the context of an abstract machine, which is defined by the standard C; The order of actions on a real machine may differ. The compiler is free to rearrange or change operations, provided that their result does not contradict the semantics of the abstract machine (the so-called as-if rule - “as if”). In our example, the order of action is just different:

Read the first double value from src [0] .
Read the second double value from src [1] .
Read the third double value from src [2] .
Read the fourth value of type double from src [3] .
Write the first int value to dst [0] .
Write the second int value in dst [1] .
Write the third value of type int in dst [2] .
Write the fourth value of type int to dst [3] .

This is the C language: the entire contents of the memory are ultimately bytes (that is, slots with unsigned char values, and in practice, groups of eight bits), and any arbitrary pointer operations are allowed. In particular, the src and dst pointers when called can be used to refer to overlapping portions of memory (this situation is referred to as “aliasing”). Thus, the reading and writing order can be important if the bytes are written and then read again. In order for the actual behavior of the program to conform to the abstract, defined by the C standard, the compiler would have to alternate read and write operations, providing a full cycle of memory accesses at each iteration. The resulting code would be larger and would work much slower. For C-developers, this would be grief.

Here, fortunately, indefinite behavior comes to the rescue. Standard C states that access to “cannot” values is done through pointers, the type of which does not correspond to the current types of these values. Simply put, if the value is written to dst [0] , where the dst pointer is of the int type, then the corresponding bytes cannot be read via src [1] , where src is a double pointer, since in this case we would try to access value, which now has type int , using an incompatible type pointer. In this case, an undefined behavior would occur. This is stated in paragraph 7 of section 6.5 of ISO 9899: 1999 (“C99”) (in the new edition 9899: 2018, or “C17”, the wording has not changed). This requirement is called the strict aliasing rule. As a result, the C compiler is allowed to act on the assumption that memory access operations that lead to undefined behavior due to a violation of the strict aliasing rule do not occur. Thus, the compiler can rearrange read and write operations in any order, since they do not have to access overlapping sections of memory. This is the code optimization.

The meaning of indefinite behavior, if briefly, is this: the compiler can assume that there will be no indefinite behavior and generate code based on this assumption. In the case of a strict aliasing rule - provided that aliasing takes place, - undefined behavior allows for important optimizations that would otherwise be difficult to implement. Generally speaking, each instruction in the code generation procedures used by the compiler has dependencies that limit the algorithm for scheduling operations: the instruction cannot be executed before the instructions on which it depends, or after those instructions that depend on it. In our example, undefined behavior eliminates dependencies between write operations in dst [] and “subsequent” read operations from src [] : such a dependency can exist only in cases where undefined behavior occurs when accessing memory. Similarly, the concept of indefinite behavior allows the compiler to simply delete code that cannot be executed without entering the state of indefinite behavior.

All this, of course, is good, but such behavior is sometimes perceived as treacherous betrayal by the compiler. You can often hear the following phrase: "The compiler uses the concept of undefined behavior as a pretext to break my code." Suppose someone writes a program that adds integers, and fears overflow - let us remember the case of Bitcoin . He can think like this: to represent integers, the processor uses additional code, which means that if overflow occurs, it will happen because the result will be truncated to the size of the type, i.e. 32 bits This means that the result of the overflow can be predicted and checked with a test.

Our conditional developer will write this:

 #include <stdio.h> #include <stdlib.h> int add(int x, int y, int *z) { int r = x + y; if (x > 0 && y > 0 && r < x) { return 0; } if (x < 0 && y < 0 && r > x) { return 0; } *z = r; return 1; } int main(int argc, char *argv[]) { int x, y, z; if (argc != 3) { return EXIT_FAILURE; } x = atoi(argv[1]); y = atoi(argv[2]); if (add(x, y, &z)) { printf("%d\n", z); } else { printf("overflow!\n"); } return 0; }

Now let's try to compile this code using GCC:

 $ gcc -W -Wall -O9 testadd.c $ ./a.out 17 42 59 $ ./a.out 2000000000 1500000000 overflow!

Well, it seems to work. Now let's try another compiler, for example, Clang (I have version 6.0.0):

 $ clang -W -Wall -O3 testadd.c $ ./a.out 17 42 59 $ ./a.out 2000000000 1500000000 -794967296

Shta

It turns out that when an operation with signed integer types leads to a result that cannot be represented by a target type, we enter the territory of undefined behavior. But the compiler may assume that it does not occur. In particular, by optimizing the expression x> 0 && y> 0 && r <x , the compiler concludes that since the values of x and y are strictly positive, the third test cannot be true (the sum of two values cannot be less than any of them) and you can skip this whole operation. In other words, since overflow is an undefined behavior, it “cannot happen” from the compiler's point of view, and all instructions that depend on this state can be removed. The mechanism for detecting undefined behavior simply disappeared.

The standard never prescribed that “wrapping semantics” (which is actually used in processor operations) is used in calculations with signed types; this happened rather because of tradition - even in those times when compilers were not smart enough to optimize the code, focusing on a range of values. You can make Clang and GCC apply wrapping semantics to signed types via the special -fwrapv flag (in Microsoft Visual C, you can use -d2UndefIntOverflow- as described here ). However, this approach is unreliable, the flag may disappear when transferring code to another project or to another architecture.

Few people know that overflowing sign types implies undefined behavior. This is stated in paragraph 5 of section 6.5 of the C99 and C17 standards:

If an exceptional state arises in the evaluation of an expression (that is, if the result is not mathematically defined or is outside the range of valid values of this type), the behavior is not defined.

For unsigned types, however, modular semantics are guaranteed. Paragraph 9 of section 6.2.5 states the following:

Overflow never occurs in calculations with unsigned operands, since a result that cannot be represented by a resultant unsigned integer type is truncated in absolute value, which is one more than the maximum value represented by the resultant type.

Another example of undefined behavior in operations with sign types is the division operation. As everyone knows, the result of dividing by zero is not mathematically defined, therefore, according to the standard, this operation entails undefined behavior. If in the idiv operation on the x86 processor, the divisor is zero, a processor exception occurs. Like interrupt requests, processor exceptions are handled by the operating system. On Unix-like systems, such as Linux, a processor exception, triggered by the idiv operation, is translated into a SIGFPE signal, which is sent to the process, and that is terminated by the default handler (don't be surprised that "FPE" is decoded as "floating-point exception" (the exception in floating point operations), while idiv works with integers). But there is another situation that leads to undefined behavior. Consider the following code:

 #include <stdio.h> #include <stdlib.h> int main(int argc, char *argv[]) { int x, y; if (argc != 3) { return EXIT_FAILURE; } x = atoi(argv[1]); y = atoi(argv[2]); printf("%d\n", x / y); return 0; } Запустим его: $ gcc -W -Wall -O testdiv.c $ ./a.out 42 17 2 $ ./a.out -2147483648 -1 zsh: floating point exception (core dumped) ./a.out -2147483648 -1

And the truth is: on this machine (all the same x86 under Linux) the type int represents the range of values from -2,147,483,648 to +2,147,483,647. If you divide -2,147,483,648 by -1, you should get +2,147,483,648 But this number is not in the range of values of type int . Therefore, the behavior is not defined. Anything can happen. In this case, the process is forcibly terminated. On a different system, especially with a small processor, in which there is no division operation, the result may differ. In such architectures, the division is performed programmatically - using the procedure usually provided by the compiler, and now it can do everything it likes with unspecified behavior, because that is exactly what it is about.

I note that SIGFPE can be obtained under the same conditions and with the help of the modulo operator ( % ). And in fact: it hides all the same idiv operation, which calculates both the particular and the remainder, so the same exception of the processor is triggered. Interestingly, the C99 standard says that the expression INT_MIN% -1 cannot lead to undefined behavior, since the result is mathematically defined (zero) and uniquely falls within the range of values of the target type. In version C17, the text of paragraph 6 of Section 6.5.5 has been changed, and now this case is also taken into account, which brings the standard closer to the real state of affairs on popular hardware platforms.

There are many unobvious situations that also lead to undefined behavior. Take a look at this code:

 #include <stdio.h> #include <stdlib.h> unsigned short mul(unsigned short x, unsigned short y) { return x * y; } int main(int argc, char *argv[]) { int x, y; if (argc != 3) { return EXIT_FAILURE; } x = atoi(argv[1]); y = atoi(argv[2]); printf("%d\n", mul(x, y)); return 0; }

Do you think that the program, following the standard C, should print, if the function is passed to the multipliers of 45,000 and 50,000?

18,048
2,250,000,000
God save the queen!

The correct answer ... yes, all of the above! You may have argued like this: once unsigned short is an unsigned type, it must support the wrapping semantics modulo 65,536, because on an x86 processor, this type of size is usually 16 bits (the standard allows for a larger size, but practice is still a 16-bit type). Since the product is mathematically equal to 2,250,000,000, it will be truncated modulo 65,536, which gives the answer 18,048. However, thinking like this, we forget about the extension of integer types. According to the C standard (section 6.3.1.1, paragraph 2), if the operands are of a type whose size is strictly less than the size of int , and values of this type can be represented by the type of int without loss of bits (and we have just such a case: in my x86 Linux int size is 32 bits, and it can obviously store values from 0 to 65,535), then both operands are cast to int type and the operation is already performed on the converted values. Namely: the product is calculated as a value of type int and already only when returning from a function is returned back to an unsigned short (i.e. it is at this point that truncation takes place modulo 65,536). The problem is that mathematically the result before the inverse transform is 2,250,000,000, and this value exceeds the range of int , which is a signed type. As a result, we get an undefined behavior. After that, anything can happen, including sudden attacks of English patriotism.

However, in practice, with the usual compilers, you get the value of 18,048, because there is still no such optimization that could take advantage of the indefinite behavior in this particular program (one can imagine more artificial scenarios where it would really do trouble).

Finally, another example, now in C ++:

 #include <stdio.h> #include <stdlib.h> #include <string.h> #include <array> int main(int argc, char *argv[]) { std::array<char, 16> tmp; int i; if (argc < 2) { return EXIT_FAILURE; } memset(tmp.data(), 0, 16); if (strlen(argv[1]) < 16) { strcpy(tmp.data(), argv[1]); } for (i = 0; i < 17; i ++) { printf(" %02x", tmp[i]); } printf("\n"); }

This is not the typical “bad terrible strcpy () !”. After all, here the strcpy () function is executed only if the size of the source string, including the terminal zero, is small enough. Moreover, array elements are explicitly initialized to zero, so all bytes in the array have the specified value, regardless of whether a large or small string is passed to the function. However, the loop at the end is incorrect: it reads one byte more than it should be.

Run the code:

 $ g++ -W -Wall -O9 testvec.c $ ./a.out foo 66 6f 6f 00 00 00 00 00 00 00 00 00 00 00 00 00 10 58 ffffffca ff ffffac ffffffc0 55 00 00 00 ffffff80 71 34 ffffff99 07 ffffffba ff ffffea ffffffd0 ffffffe5 44 ffffff83 fffffffd 7f 00 00 00 00 00 00 00 00 00 00 10 58 ffffffca ffffffac ffffffc0 55 00 00 ffffff97 7b 12 1b ffffffa1 7f 00 00 02 00 00 00 00 00 00 00 ffffffd8 ffffffe5 44 ffffff83 fffffffd 7f 00 00 00 ffffff80 00 00 02 00 00 00 60 56 (...) 62 64 3d 30 30 zsh: segmentation fault (core dumped) ./a.out foo Шта++?

You can naively object: well, it reads an extra byte outside the array; but it's not so scary, because this byte is still on the stack, it is displayed in memory, so the only problem here is an extra seventeenth element with an unknown value. The cycle will still print exactly 17 integers (in hexadecimal format) and end without any complaints.

But the compiler has its own opinion on this. He is well aware that the seventeenth reading provokes undefined behavior. According to his logic, any subsequent instruction is suspended: there is no such requirement that, after an indefinite behavior, something must exist at all (formally, even previous instructions may be under attack, since indefinite behavior acts in the opposite direction). In our case, the compiler will simply ignore the condition check in the loop, and it will spin forever, more precisely, until it starts reading outside the allocated stack memory, after which the SIGSEGV signal will work.

It's funny, but if you run GCC with less aggressive settings for optimizations, it will give a warning:

 $ g++ -W -Wall -O1 testvec.c testvec.c: In function 'int main(int, char**)': testvec.c:20:15: warning: iteration 16 invokes undefined behavior [-Waggressive-loop-optimizations] printf(" %02x", tmp[i]); ~~~~~~^~~~~~~~~~~~~~~~~ testvec.c:19:19: note: within this loop for (i = 0; i < 17; i ++) { ~~^~~~

At the -O9 level , this warning somehow disappears. Perhaps the fact is that at high levels of optimization, the compiler imposes a more aggressive deployment of the loop. It is possible (but inaccurate) that this is a GCC bug (in the sense of missing a warning; this is how GCC actions in any case do not contradict the standard, because it does not require issuing “diagnostics" in this situation).

Conclusion: if you are writing code in C or C ++, be extremely careful and avoid situations that lead to undefined behavior, even when it seems that “nothing terrible”.

Unsigned integer types are a good helper in arithmetic calculations, since they are guaranteed modular semantics (but you can still get problems related to the expansion of integer types). Another option - unpopular for some reason - does not write at all in C and C ++. For several reasons, this solution is not always suitable. But if you can choose which language to write the program in, ie when you start a new project on a platform with support for Go, Rust, Java or other languages, it may be more profitable not to use C as your “default language”. The choice of tools, including a programming language, is always a compromise. Pitfalls C, especially undefined behavior in operations with sign types, lead to additional costs with further code maintenance, which are often underestimated.

Undefined behavior and truth undefined

More articles: