There is a dword number - RGBA pixel. It is necessary to add it with another pixel (arithmetic with saturation saturation), and to do this through the xmm registers, packing these dword'ы in 128 bits

 unsigned int A0 = 0xFF99AA00; unsigned int B0 = 0xFF80AA00; unsigned int C0 = 0xFF60AA00; unsigned int D0 = 0xFF70AA00; unsigned int A1 = 0xFF90AA00; unsigned int B1 = 0xFF80AA00; unsigned int C1 = 0xFFB0AA00; unsigned int D1 = 0xFFC0AA00; _asm { /*........*/ PADDUSB xmm1, xmm2 } 

How to pack 4 + 4 dword so that they fold like 16 + 16 byte ? In place of /*...*/ need the appropriate assembler instructions for SSE2. The sum of these byte cannot be more than 255.


And another small question: how could one pack 16 chars, i.e. if for each color there would be a separate variable?

  • Look here www.stackoverflow.com/a/496577/178678 , here is just what you need is already ready and without an assembler code. - Flowneee
  • Why did you not take a single answer in your last question? Do you think someone will want to give you answers if you do not accept them? - ixSci
  • @ixSci all answers are appropriate. Need to choose the best among them? - neko69
  • @ neko69, you need to choose the one that suits you, which you used. Or the one you liked the most. You should always choose the answer, or update the question if none is appropriate. Questions without accepted answers spoil the site statistics. - ixSci
  • one
    In addition, I now checked: one intrinsic is one assembly command, so you couldn’t shorten that example. Here's a link to the Intel site where you can find a match between intrinsics and assembler commands - ixSci

0