Why function

int max1(int x,int y) { return x>y ? x : y; } 

runs slower than

  int max2(int x,int y) { return x<y ? y : x; } 

?

I found an article about it even (at the end of the speed comparison table)

Here is my comparison:

  #include <iostream> #include <cstdlib> #include <algorithm> #include <vector> #include <utility> #include <ctime> int max1(int x,int y) { return x>y ? x : y; } int max2(int x,int y) { return x<y ? y : x; } int main() { std::srand(unsigned(std::time(0))); const unsigned long long size = 1000000000; std::vector<std::pair<int,int>> v(size); std::for_each(v.begin(),v.end(),[](std::pair<int,int>& x){x=std::make_pair(std::rand(),std::rand());}); unsigned int start1 = clock(); for (auto i : v) max1(i.first,i.second); unsigned int end1 = clock(); std::cout << end1-start1 << std::endl; unsigned int start2 = clock(); for (auto i : v) max2(i.first,i.second); unsigned int end2 = clock(); std::cout << end2-start2 << std::endl; } 
  • How and on what data were compared, and how much difference? - Schullz
  • give the comparison algorithm, the functions are likely to be converted into the same assembly code. - pavel
  • @ nikolay - You asked a question that contains the alleged behavior, without giving absolutely no data to allow this behavior to be reproduced. - Igor
  • updated the question. --Nikolay
  • Well, everything is written in the belts, but in general it depends on the specific version of the architecture compiler, and so on. - pavel

2 answers 2

I just recently considered a similar question in this article (at the very end).

It turned out that on the x86 architecture everything depends on the order of comparison. If x is compared with y, then the compiler gives one sequence of commands, and if vice versa, then another sequence of the same commands. The reason for the difference in speed, apparently, in the features of the micro-architecture. One sequence of commands seems to be better placed on the conveyor or some other third-party factors influence.

This is a common misfortune, probably, of all Intel processors. Similarly, I noticed that the inc eax command will be slower than lea eax, [eax+1] . There are probably many more such jokes.

  • I apologize, I did not know that it was in my article that I considered this thought. It was necessary to immediately say ... - Zealint
  • one and the same article that the question that the answer) I understand that the answer was typed before the article in the question appeared, but apparently the author did not understand something from the article. - pavel
  • one
    You know, I, as the author of the article, also didn’t really understand why:) - Zealint
  • one
    in fact, the entire measurement mechanism is crooked, to drive i from 0 to 0 is UB, the compiler can compress all comparison before assignment immediately or in chunks. And check that changing the a and b places, the result does not change to the opposite. And yes, the compilation options would be seen, maybe by playing with -O3 -O2 -OX there will be different results. - pavel
  • one
    VC ++ 2015 generated exactly the same code, with exactly the same execution time ... - Harry

There is no difference in the performance of the above functions and cannot be (*). They can be converted to absolutely identical assembly code, which is done, for example, by gcc. clang and MSVC generate different conditional move instructions, but their difference is only in the comparison instruction (less and more or equal), which is implemented in the digital logic in the same way, therefore there will be no difference.


(*) The difference in performance can only be in the following cases:

  • Wrong measurements
  • Compiler error
  • one
    I will add a specific fact. When translating g ++ -O3 ... on Linux x86_64 code of both functions is the same and consists of 4 commands: `cmpl% esi,% edi; movl% esi,% eax; cmovge% edi,% eax; ret` (I added ; as a separator) - avp