Help with the example of the scalar product of two vectors specified by double arrays, using SSE instructions. All the examples I found are for some reason written for the type float .
For example, this is the function

 float inner(int n, float* x, float* y) { __m128 *xx = (__m128*)x; __m128 *yy = (__m128*)y; __m128 s = _mm_setzero_ps(); for(int i=0; i<n/4; ++i) { __m128 p = _mm_mul_ps(xx[i],yy[i]); s = _mm_add_ps(s,p); } __m128 p = _mm_movehl_ps(p,s); s = _mm_add_ps(s,p); p = _mm_shuffle_ps(s,s,1); s = _mm_add_ss(s,p); float sum; _mm_store_ss(&sum,s); return sum; } 

    2 answers 2

    And there is nothing complicated. Just need to change the team to others. Double is twice as big as a float, so the cycle should be kept up to n/2 , not n/4 , then look for a reference book and commands for working with double. For example, _mm_mul_pd is a double multiplication, similarly, most likely, _mm_add_pd will be for addition. Similarly, you can find the remaining commands in Google, it took me 5 minutes.

    UPD: it is better not to do this nonsense after the cycle, which is written in the example, but it is better with the command _mm_store_pd to save two doubles from the xmm register into two variables and calmly return their sum.

    • I tried it, that’s what happened __m128d *xx = (__m128d*)x; __m128d *yy = (__m128d*)y; __m128d s = _mm_setzero_pd(); for (int i = 0; i < n / 2; ++i) { __m128d p = _mm_mul_pd(xx[i], yy[i]); s = _mm_add_pd(s, p); } double s1, s2; _mm_store_sd(&s1, s); _mm_store_sd(&s2, s); return s1 + s2; __m128d *xx = (__m128d*)x; __m128d *yy = (__m128d*)y; __m128d s = _mm_setzero_pd(); for (int i = 0; i < n / 2; ++i) { __m128d p = _mm_mul_pd(xx[i], yy[i]); s = _mm_add_pd(s, p); } double s1, s2; _mm_store_sd(&s1, s); _mm_store_sd(&s2, s); return s1 + s2; The answer is correct, but something is wrong with memory - Eugene
    • Glancing at the code, I thought you were using _mm_store_sd incorrectly. This function (read the description) immediately puts two doubles at the specified address. Try to create an array of s [2], write to it and then add s [0] + s [1]. Always read the function description before use. - Zealint
    • Strange, but if you do this: double ss[2]; _mm_store_sd(ss, s); return ss[0] + ss[1]; double ss[2]; _mm_store_sd(ss, s); return ss[0] + ss[1]; , it lacks half the work - Eugene
    • @ Eugene, I say, learn to read the documentation for the functions. I, for example, was mistaken that this function writes two double addresses, and you did not want to check me out. This is done by another function. Please try to find it yourself. It is very easy to do. I'll tell you: the difference in the name will be in one letter. - Zealint
    • I did it like this _mm_store_pd(ss, s); return ss[0] + ss[1]; _mm_store_pd(ss, s); return ss[0] + ss[1]; Now the result is correct, but the error with memory has remained - Eugene

    Now write it in ordinary oak C ++ with loops and compare performance. I do not think that on such nonsense as a scalar product, you will be smarter than the compiler. *


    * If your compiler is not the second five-year period from the release went, of course.

    • Only the compiler flags must explicitly allow generation of SSE commands. - ߊߚߤߘ
    • @Arhad - Visual Studio even on default release settings (/ O2) does it. - gbg