Question on OpenAcc. For some reason a piece of code:
#include <openacc.h> ... #define NSZ (1<<16) ... //#pragma acc kernels for (i=0; i<NSZ; i++) C[i]=A[i]+B[i]; with the directive ... kernels is 20% complete .. 15% slower than without it
compiles with gcc -fopenacc -msse2 options ...
about the same code using OpenCl is performed 1.5..2 times faster compiler version 5.1.0 NVIDIA GT 950 video card I'm doing something wrong?
I would also like to receive links to Russian-language documents on the use of GPU with examples (OpenAcc, OpenCl and others ...)