Reading and writing union members

Question

I can not find a definite answer to the next question.

As far as I can remember, the union has always been used not so much for alternately storing different data in one place, but for flexible access to some pieces of these same data. In other words, for writing data of one type and reading data of another type.

For example:

 typedef union u_color_pack { uint8_t b[4]; uint32_t raw; } color_pack; // … uint8_t color_check(const uint32_t _raw) { color_pack color; color.raw = _raw; if (color.b[0] == 0) { return 0; } return 1; }

color_pack allows color_pack to work with color as raw data of type uint32_t or directly access individual bytes (colors). Yes, I understand that the given code depends on the order of bytes, but such code is usually written with the expectation of a strictly defined order of bytes.

So, the problem is that some people say that it is impossible to read from the union data of type A , if before that the union has written the data of type B

Some say that this situation is uncertain behavior. Others - what is the behavior defined by the implementation.

What do the C and C++ standards say about this issue? Is what they say different?

C permits, and C ++ prohibits (although, for example, it is written in the GCC manual that it allows as an exception even in C ++).
The only special case in which this is allowed in C ++ is when both active and selected union members are standard-layout types with the same set of fields at the beginning — then you can refer to any of these fields.
For references to the standard, it is better to wait for someone like AnT.
byte order in the system - most often - define - the option is written in #ifdef else endif , it makes sense only if you write a cross-processor application, or some kind of cross-platform with special platforms.
If you are writing under Windows or ordinary x86 / x64 Linux and do not plan to change to another platform's CPU, then this should not be taken into account.
@HolyBlackCat, in which parts of the standards is it to look for?

Accepted Answer · 2019-03-05T12:51:16

The task is to address data at the same address and as an entire structure and as an array of bytes is a very common task.

from VTT comments, to avoid UB should do so.

 uint8_t color_check(const uint32_t _raw) { uint_8_t * b = reinterpret_cast<uint8_t *>(&raw) if (b[0] == 0) { return 0; } return 1; }

Should not be done through

union - because The UB optimizer assumes that in one “iteration” the union uses one branch, and when the optimizer of the new generation is on, the program may not work correctly.
Through the caste without specifying the type of caste. ((char*)&_raw)[0] - you can catch UB.

About the byte order - the byte order can change only if non-standard platforms are used, i.e. if you assume that the code will work on platforms other than intel x86 / x64-compatible (or there AMD). To speed up calculations, predefine with #define and assign a value to the preprocessor, for example

 #ifdef litte_indian // Прямой порядок #else // Обратный порядок #endif

Now regarding the standards.

To the UB account https://habr.com/ru/post/216189/ p 1.3.12

Undefined behavior is behavior that may arise as a result of using erroneous software constructs or incorrect data, which International Standard does not impose any requirements on. Indefinite behavior can also occur in situations not explicitly described in the Standard.

On the use of union, the standard does not specify how to use the union correctly, and the standard c++ has already been answered by VTT p 12.3

In c ++ union, at most one time field can be active at any time. With the exception of accessing the common substructure of standard-layout objects, accessing inactive fields is an undefined behavior. In the general case, to access an inactive field, you must first manually call the destructor of the active field, then call the placement new field that you want to make active.

"Through castes without specifying the type of caste. ((char*)&_raw)[0] - you can catch UB."
UB will not be here, C-style cast works here as reinterpret_cast .
"Through memcpy, because memcpy optimizer also optimizes - you can catch UB."
- The optimizer is not allowed to create UB where it was not, it would violate the as-if rule.
And besides, the cast method works only for char / unsigned char (for uint8_t too).
For the first, I participated in SO in the question where it was said that it was impossible to do this ... now I will look ....

VTT VTT 13.9k 2 7 21 · Answer 2 · 2019-03-05T11:23:18

In C ++, a union can have at most one active field at any given time. With the exception of accessing the common substructure of standard-layout objects, accessing inactive fields is an undefined behavior. In the general case, to access an inactive field, you must first manually call the destructor of the active field, then call the placement new field that you want to make active.

12.3 Unions [class.union]
This is the case where it has been signed and it has not ended (6.8). It can be active at any time. If you need to make it, it’s possible to make it possible This is the case of the standard-layout struct members; see 12.2. —End note]

Is the code valid in the question, if not, how to correctly write down the given code?
The code in question is not valid, for accessing b this field must first be initialized, while initialization of the raw field loses its meaning and there is no "flexible access" here
Although in theory it is in this case that the destructor can be not called, since the type is trivially-destructible.
Then the result is not the same ... Can the indefinite behavior of UB be turned off by the compiler option then when is it necessary?
Through the link - it is impossible - UB, through the union - it is impossible - UB.

Reading and writing union members

2 answers 2

More articles: