For example, there is a cycle of transferring YUY2 buffer to BGR buffer (from 100 to 900+ thousand iterations):

OffsetBGR = 0; while(OffsetYUY2 < VHDR->dwBufferLength){ // Read from 4-byte YUY2 block Y1 = VHDR->lpData[OffsetYUY2++] - 16; U = VHDR->lpData[OffsetYUY2++] - 128; Y2 = VHDR->lpData[OffsetYUY2++] - 16; V = VHDR->lpData[OffsetYUY2++] - 128; // Record to 6-byte BGR block FrameData[OffsetBGR++] = GET_B(Y1, U, V); FrameData[OffsetBGR++] = GET_G(Y1, U, V); FrameData[OffsetBGR++] = GET_R(Y1, U, V); FrameData[OffsetBGR++] = GET_B(Y2, U, V); FrameData[OffsetBGR++] = GET_G(Y2, U, V); FrameData[OffsetBGR++] = GET_R(Y2, U, V); } 

... and there are algorithms for obtaining individual elements by macros:

 #define CLAMP(t) ((t>255)?255:((t<0)?0:t)) // YUV to RGB #define GET_R(Y,U,V) CLAMP(((298 * Y + 409 * V + 128) >> 8)) #define GET_G(Y,U,V) CLAMP(((298 * Y - 100 * U - 208 * V + 128) >> 8)) #define GET_B(Y,U,V) CLAMP(((298 * Y + 516 * U + 128) >> 8)) 

... or embedded functions:

 inline unsigned char clapm_byte(int value){return (value>255)?255:((value<0)?0:value);} inline unsigned char R(char Y, char V){return clapm_byte((298 * Y + 409 * V + 128) >> 8);} inline unsigned char G(char Y, char U, char V){return clapm_byte((298 * Y - 100 * U - 208 * V + 128) >> 8);} inline unsigned char B(char Y, char U){return clapm_byte((298 * Y + 516 * U + 128) >> 8);} 

What in long cycles to use more efficiently in terms of performance - macros or embedded functions? And is it normal to use inline in inline if the function is small?

  • Why ask when you can measure? But in general, this is a valid example of the use of macros (in C ++), since there is an alternative language mechanism. It is quite normal to use only inline functions. Note that inline does not necessarily mean inserting the function body into the calling code. - VTT
  • 2
    Depending on the compiler and the compilation flags, inline functions can be deployed to a regular piece of code, so there will be no difference in performance. And in your case it is really better to measure - selya
  • one
    Get the assembler code and compare - avp
  • If you test the options, try another SSE, here’s an enSO question on this topic . - user239133

2 answers 2

There is no difference in terms of performance, since the inline function code in c ++ is inserted at the compilation stage where it was called. Subtleties, of course, depend on the compiler.

The difference will be that for inline functions, the compiler additionally generates some code that will help to avoid some problems with #define . For example:

 #define addTwoSameNumbers(a) a + a 

And then you call it like this:

 addTwoSameNumbers(++a) 

And as a result, the code that the compiler will add will look like:

 ++a + ++a 

And then the variable a will be increased twice. With inline functions, there are no such problems, since the compiler takes care of this.

English answer: https://stackoverflow.com/questions/3554527/whats-the-difference-in-practice-between-inline-and-define

    There is no difference in the sense of functions, and the compiler has long learned to substitute inline functions to the call point where it brings benefits.

    However, it is much easier to make mistakes with macros. For example, your macros have a typical error:

     #define GET_R(Y,U,V) CLAMP(((298 * Y + 409 * V + 128) >> 8)) 

    Imagine someone calling

     GET_R(old_Y + 1, old_V + 1, old_R + 1) 

    This will unfold in

     CLAMP(((298 * old_Y + 1 + 409 * old_V + 1 + 128) >> 8)) 

    Debugging such an error will be very difficult, and the compiler will not warn you about this. Therefore, it makes sense not to use macros without extreme, real need.

    • 2
      @Iceman, usually to avoid such an error in using a macro, argument substitutions in its body are enclosed in parentheses. Something like this - #define GET_R(Y,U,V) CLAMP(((298 * (Y) + 409 * (V) + 128) >> 8)) . Of course, in general, it does not save from side effects. - avp