It is often said that one or another code is not valid, because it violates strict aliasing. And what is it?
- onehere you can add a tag [c] and write an answer about the rules in C - Abyx
- addition to the answer @Abyx --- memcpy should also be used with care, I encountered situations when memcpy on the AARCH64 architecture behaved as if it were rounding off a pointer by discarding the three lower bits of the address pointed to by the pointer. As a result, broken data on the boundaries where copying access occurred to addresses not aligned to the 64-bit border (8 bytes). It was with the code in C, I will not say exactly how the source was called, but in it I read about the fact that in favor of optimization, alignment and intersection checks with memcpy were dropped by a snandart. It was also said that until a certain point - Alexander
1 answer
Aliasing (pseudonyms / overlap / aliasing) is a situation where two different names (for example, pointers) denote the same object.
int x; int* p = &x; int& r = x; // алиасинг: x, r и *p обозначают один и тот же объект.
This is important for the optimizer: if there are two pointers of the same type, then after writing to one pointer, the value on the other pointer may change:
int f(int* a, int* b) { *a = 0; *b = 1; return *a; // в *a может быть как 0 так и 1, // оптимизатор не может использовать return 0 }
Strict aliasing is the unofficial name of the rule, according to which aliasing is prohibited for objects of different types.
In standard C ++, this rule is as follows:
3.10 Lvalues and rvalues
[basic.lval]
paragraph 10:If a program tries to access an object's value via a glvalue of a type that is not listed in the list below, then the behavior is undefined:
- dynamic type of this object, incl. with the addition of
const/volatile
orsigned/unsigned
;- a type that is similar to the dynamic type of an object (for example,
const int*
is similar toint*
, see4.4. [conv.qual]
);- aggregate type (array or class or
union
), which includes a data member with one of the types specified above;- the base type of the dynamic type of the object (including with the addition of
const/volatile
);char
orunsigned char
.(Footnote: The purpose of this list is to indicate when aliasing is allowed.)
The object is a memory area. He has a lifetime, a type and maybe a name.
The dynamic type is the type of the most inherited object pointed to by the expression. For example, ifD
inherited fromB
, and there is a variableB* b = new D;
, the dynamic type*b
isD
From this it follows that although a pointer to one type can be converted to a pointer to another type, nothing can be read from the resulting pointer:
char* pc = new char[100]; int* pi = reinterpret_cast<int*>(pc); // OK, просто каст int i = *pi; // ЗАПРЕЩЕНО: динамический тип это char, а читается int
The optimizer can use this as follows:
int f(int* a, short* b) { *a = 0; *b = 1; return *a; // в *a может быть только 0, // у *b другой тип, по этому запись в *b не может менять *a // оптимизатор может изменить код на return 0 }
The item about arrays and classes means that the object can be accessed through the object in which it is located, for example:
struct S { int a; }; S s; sa = 1; S s_ = s; // Доступ к S::a через весь объект с типом S (довольно очевидно)
At the same time, the code using another type was not valid not because of strict aliasing, but because of an attempt to dereference a pointer resulting from reinterpret_cast
. The standard allows only the inverse transformation (however, the term unspecified is used here, so the compiler can use its own rules).
struct S2 { int a; }; S2* s2 = reinterpret_cast<S2*>(s); int a = s2->a; // разыменование результата reinterpret_cast // при этом тип s2->a это int, так что strict aliasing не нарушен
The union
the notion of an active data member, so reading another member violates strict aliasing:
union U { int i; short s; char c; }; U u; ui = 0; // активный член short s = us; // ЗАПРЕЩЕНО, обращение к объекту с типом int через тип short char c = uc; // ОК, char - это особый случай
The last item in the list about char
or unsigned char
is a loophole for functions like memcpy
/ memset
/ etc:
void my_zero_memory(void* p, size_t n) { char* bytes = static_cast<char*>(p); for (; n != 0; --n, ++bytes) *bytes = 0; // OK, к любому типу можно обращаться через char } int x[100]; my_zero_memory(x, sizeof(x));
However, any attempts to use other types lead to undefined behavior, for example:
// НЕПРАВИЛЬНО void my_fast_zero_memory(void* p, size_t n) { uint64_t* quads = static_cast<uint64_t*>(p); for (; n > 7; n -= 8, ++quads) *quads = 0; // НЕПРАВИЛЬНО, работает только для массивов (u)int64_t my_zero_memory(quads, n); }
Unfortunately, the Internet is full of such "fast" code, which can break at any time if the compiler uses any optimization after embedding such a function. (The correct memset is a standard memset, or it should be written, for example, in an assembler, where there are no strict aliasing rules).
Since there is a lot of such incorrectly written code, the GCC compiler has the option -fno-strict-aliasing
, which disables the optimizations associated with aliasing.
- But in the documentation for gcc (-fstrict-aliasing to the same extent applies to C ++) they write that such as you described working with union is not prohibited (valid), but if you access the fields using pointers , then yes, here already ub. / (And so, yes, a useful topic (+1), by the way, you can also write about -fstrict-overflow UB, like there was recently a discussion about integer overflow) - avp
- one@avp, this is called type punning and is valid only for C (see C99 6.5.2.3, 6.2.6.1). In C ++, there is no such thing. Gcc can propagate this C rule in C ++, but this is a compiler extension. - Abyx
- And those. Is this a gcc property? And how in others to work with union? To invent something to let it slow down the execution, but which sets the brain to the compiler? Or just not to use such platforms? - avp
- 2@avp instead of referring to another union member, you should use memcpy. union is designed to store different types at different times, and not for type conversions. - Abyx 5:02 pm
- May be. But it seems to me that most practitioners think differently (or rather, they don’t even think about it). Therefore, I hope that this orgy of optimization will cease (may be and the standard will have to be revised for the sake of the established practice) - avp