And the question, and at the same time just wanted to share my surprise. I looked through my listings today, and came across a funny optimization result by the compiler from VS 2008. I localized it in the following example:

unsigned int rol(unsigned int v,char n) { __asm { mov eax,v; mov cl,n; rol eax,cl; } } int _tmain(int argc, _TCHAR* argv[]) { volatile int a=4; std::cout<<rol(a,2); return 0; } 

the result of compiling with speed optimization (the place that puzzled me remains unchanged when optimizing for size) as follows:

 00401CA0 push ecx 00401CA1 mov dword ptr [esp],4 00401CA8 mov eax,dword ptr [esp] 00401CAB mov dword ptr [esp],eax 00401CAE mov eax,dword ptr [esp] 00401CB1 mov cl,2 00401CB3 rol eax,cl 00401CB5 mov ecx,dword ptr [__imp_std::cout (406204h)] 00401CBB push eax 00401CBC call dword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (406208h)] 00401CC2 xor eax,eax 00401CC4 pop ecx 

With the first two commands, everything is clear - we allocate 4 bytes on the stack for one local variable, well, and actually assign the value 4 to it. With the 3rd one, everything seems clear too - a volatile variable, so it needs to be read. What we have with the 4th team. The optimizer built the body of the rol function into wmain , but for some reason (???) decided to transfer v not in the register, but still allocate memory for it on the stack. How does he do it? He seems to think that since there is nowhere else in the text in the text, then it is possible to use these “freed” 4 bytes for this purpose. Further, my attention did not attract anything.

The logic of work seems to be clear, except for the decision to transfer v through the stack, and not through the register. But in the end, it turns out just an absurd thing in my opinion:

 00401CA8 mov eax,dword ptr [esp] 00401CAB mov dword ptr [esp],eax 00401CAE mov eax,dword ptr [esp] 

The first two lines can be simply thrown out without loss, having received a gain in both speed and size. But this is an optimizer , it must somehow generate the optimal code, and it generates a code from which you can just throw out line by line without loss, is it not absurd?

  • I tried with g ++ -O3 -S, but without __asm ​​{...} (in gcc, there is another assembler syntax). Nothing more, everything is inline. 4-ka is put on the stack, then loaded into% eax, shifted (since roll in C ++ is not, I wrote << (sall $ 2,% eax)), the result is on the stack and call print. Well, "nothing extra" is not counting all the cross gadgets. - avp
  • gcc without optimization options creates too much extra, but already with -O2 everything comes to its proper form. - skegg pm
  • The author writes about compilation with speed optimization (and the same result when optimized for size). Apparently the fact is that this is a free MS product. In the pro version, I think the result will be different. - avp

0