On the Internet, found the implementation of the cellular automaton game "Life", accelerated using intrinsic functions. I can not understand what this snippet does

void gameoflife8vec(uint8_t *counts, uint8_t *states, size_t width, size_t height) { assert(width % (sizeof(__m256i)) == 0); size_t awidth = width + 2; computecounts8vec(counts, states, width, height); __m256i shufmask = _mm256_set_epi8( 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0 ); for (size_t i = 0; i < height; i++) { for (size_t j = 0; j < width; j += sizeof(__m256i)) { __m256i c = _mm256_lddqu_si256( (const __m256i *)(counts + (i + 1) * awidth + j + 1)); c = _mm256_subs_epu8( c, _mm256_set1_epi8( 1)); // max was 8 = 0b1000, make it 7, 1 becomes 0, 0 remains 0 __m256i oldstate = _mm256_lddqu_si256( (const __m256i *)(states + (i + 1) * awidth + j + 1)); __m256i newstate = _mm256_shuffle_epi8( shufmask, _mm256_or_si256(c, _mm256_slli_epi16(oldstate, 3))); _mm256_storeu_si256((__m256i *)(states + (i + 1) * awidth + (j + 1)), newstate); } } } 

I don't understand exactly what exactly and how this line does

 __m256i newstate = _mm256_shuffle_epi8(shufmask, _mm256_or_si256(c, _mm256_slli_epi16(oldstate, 3))); 

In this implementation, states is an array of cell states ("World", that is, it is written alive / dead), counts is an array in which the number of living neighbors for a similar cell from states is written. The number of living neighbors for all cells in states is calculated by the computecounts8vec function. That is, it turns out that the above line should, by checking the rules of the game (if the cell is alive and it has <2 or> 3 neighbors, it dies or, if it is dead and the lives of neighbors 3, it comes to life) to get a new state for the vector of cells from states . I googled found out that the order of actions in the line is as follows:

1) perform a bitwise shift of 3 bits left of all states in the oldstate vector

2) to obtain the result of the operation bitwise OR for c , which is the number of living neighbors - 1, and the result of paragraph 1)

3) mix the result of paragraph 2) according to the shufmask mask (it is extremely unclear where this mask comes from)

Maybe I misunderstood the meaning of the functions, but I cannot understand how the magic happens in this line and, in general, what happens to the lengths of numbers packed into vectors . Explain, please, to me stupid, preferably as detailed as possible, what, where and why.

The memory for states and counts is allocated as follows:

 uint8_t *states = (uint8_t *)malloc((N + 2) * (N + 2) * sizeof(uint8_t)); uint8_t *counts = (uint8_t *)malloc((N + 2) * (N + 2) * sizeof(uint8_t)); 

Where N is the size of the matrix (World of the game) in one dimension.

PS source code https://github.com/lemire/SIMDgameoflife

  • And this and this looked? - MBo
  • @MBo unfortunately, and still I don’t understand how a new cell state is obtained. That is, the bit shift is clear to me (it is not clear why it is and why epi16 is there despite the fact that 8 bit uint8_t are used ), bitwise or, again, it’s clear how it works, but the goal is not clear, and the shuffle is quite a jungle
  • Probably you need to look at the code before determining the function ... - AR Hovsepyan
  • @ ar-hovsepyan added source code - lolmosk
  • Sorry, as it is unreasonable to understand someone else's bad (not OOP) code, and then tell you about it. I said that you yourself need to look and figure out if you are interested - AR Hovsepyan

0