How to distribute an array of working groups in openCL?

Question

Taking the first steps in working with openCL (in this matter it does not matter but using the Nvidia GPU), I ran into a question.

I have the simplest program that works with a one-dimensional array. In kernel, a value is calculated using several cells of this array, even if, for example, an array of 1000 elements, and in kernel, only 9 of them are used. I pass a global input buffer to kernel which contains the entire array. The task is to go through each cell and calculate values for a new, similar array.

My question is how, when starting the calculations, transfer to each workgroup only demanded elements (part of the array), so that each time not to work with the __global buffer and use __local, or to transfer to each thread only those elements of the array that it needs to with __private memory and potentially accelerate thereby the work of the program.

Here is a piece of the program that calls kernel

// Get the workgroup size size_t workgroup_size; err = clGetKernelWorkGroupInfo(kernel, device_id, CL_KERNEL_WORK_GROUP_SIZE, sizeof(size_t), &workgroup_size, NULL); if (err != CL_SUCCESS) { log_error("Unable to get kernel work-group size"); } // Send the massive to the OpenCL stack err = clEnqueueWriteBuffer(queue, input, CL_TRUE, 0, sizeof(massive), massive, 0, NULL, NULL); if (err != CL_SUCCESS) { log_error("Unable to enqueue buffer"); } // Run the kernel on every element in the massive err = clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &board_size, &workgroup_size, 0, NULL, NULL); if (err) { log_error("Unable to enqueue kernel"); }

The kernel function declaration looks like this.

 __kernel void life(constant int* input, global int* output, const unsigned int massive_size)

If that does not beat with sticks, I am a novice.

How to distribute an array of working groups in openCL?

0

More articles: