From the point of view of speed, on the x86 architecture, which option to create a set of flags is the most optimal - structure + bit fields, simple variable bits, separate variables of 1 byte for each flag. How sensitive is the difference?

  • one
    Write a test. It all depends on the task. In some tasks, I won arrays by byte per bit, and in some cases - manual access to bits. - KoVadim
  • one
    The result will still strongly depend on the compiler and the chosen level of optimization - alexlz
  • one
    If the interest is not speculative, then what is this practical task in which you can measure the effect of the implementation of flags on the execution time? Anything embedded? - avp
  • It seems there are even processors with x86 architecture under embedded, but they are used extremely rarely. The target machine is weak, interest is somewhere in the middle between the speculative and practical. - vard
  • one
    Perhaps it was worth asking the question in a slightly different way: The installation, deletion and flag check operations will be performed faster on the x86-32 architecture for which of the above flag execution options (or for any other executions). From the point of view of architecture. - vard

1 answer 1

The comment limit has been reached. Therefore, you have to answer @ karmadro4 in response.

You are right, the timing tables are not missing. Only using them is very difficult to calculate the execution time of a program fragment. Data delays associated with filling caches and TLBs are not obvious (they depend on the execution of other threads and interrupts in the system).

Also, without knowing the specific values ​​of the data being processed, it is impossible to predict the prefetch of commands, operands and the extraordinary execution of commands.

Data exchange between the cache and the RAM occurs in the cache lines (usually 32 bytes), and between the cache and the ALU buffers with the words (32 or 64 bits), therefore, “uchars” do not change anything, as compared to “ulongs”.

I repeat once again, this topic is very interesting, but not obvious.

  • @avp, I will delve a bit into the philosophy of science and offer to abstract from the influence of unknown random processes on the phenomenon under study ... By the way, it would be more profitable to contact RAM at an address that is aligned on the computer word boundary. I think that on a 32-bit processor using 32-byte strings to exchange with the second-level cache, 8 flags would be beneficial to inflate to an unsigned long[8] - karmadro4