I have one three-dimensional block (or several, but this is not important, since they are independent). I need to synchronize the streams in this block as follows: first all streams with coordinate z = 0 must be executed, then z = 1 and so on. That is, it is necessary to synchronize the flows in layers along the z axis. I already have a few people cite the following code as an example:

__global__ void Kernel(){ for(int z = 0; z < zmax; z++){ if(threadIdx.z == z){ //code __syncthreads(); } } } 

If for clarity, to reduce this all, then instead of a cycle there will be something like this

  if(threadIdx.z == 0){ __syncthreads(); } if(threadIdx.z == 1){ __syncthreads(); } и т. д. 

Such a code in theory guarantees synchronous execution of threads with the same z, but does not guarantee the order of z. After all, all streams with z = 1 can come first to the synchronization point, and then the layer where z = 1 will be executed before where z = 0.

Is my reasoning correct? If so, how to do as I want. Or is it impossible?

    1 answer 1

    Understand the problem. __syncthreads () should be used outside if (). I just misunderstood people who do not like to write braces once again.

    That's right.

      __global__ void Kernel(){ for(int z = 0; z < zmax; z++){ if(threadIdx.z == z){ //code } __syncthreads(); } } 

    Then everything will be very logical. This will be equivalent to the following.

      if(threadIdx.z == 0){ //code } __syncthreads(); if(threadIdx.z == 1){ //code } __syncthreads(); и т. д 

    All streams will reach every if () and only those that are needed will be executed. Thus it is possible to ensure consistent execution on z.