On the Internet, found the implementation of the cellular automaton game "Life", accelerated using intrinsic functions. I can not understand what this snippet does
void gameoflife8vec(uint8_t *counts, uint8_t *states, size_t width, size_t height) { assert(width % (sizeof(__m256i)) == 0); size_t awidth = width + 2; computecounts8vec(counts, states, width, height); __m256i shufmask = _mm256_set_epi8( 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0 ); for (size_t i = 0; i < height; i++) { for (size_t j = 0; j < width; j += sizeof(__m256i)) { __m256i c = _mm256_lddqu_si256( (const __m256i *)(counts + (i + 1) * awidth + j + 1)); c = _mm256_subs_epu8( c, _mm256_set1_epi8( 1)); // max was 8 = 0b1000, make it 7, 1 becomes 0, 0 remains 0 __m256i oldstate = _mm256_lddqu_si256( (const __m256i *)(states + (i + 1) * awidth + j + 1)); __m256i newstate = _mm256_shuffle_epi8( shufmask, _mm256_or_si256(c, _mm256_slli_epi16(oldstate, 3))); _mm256_storeu_si256((__m256i *)(states + (i + 1) * awidth + (j + 1)), newstate); } } } I don't understand exactly what exactly and how this line does
__m256i newstate = _mm256_shuffle_epi8(shufmask, _mm256_or_si256(c, _mm256_slli_epi16(oldstate, 3))); In this implementation, states is an array of cell states ("World", that is, it is written alive / dead), counts is an array in which the number of living neighbors for a similar cell from states is written. The number of living neighbors for all cells in states is calculated by the computecounts8vec function. That is, it turns out that the above line should, by checking the rules of the game (if the cell is alive and it has <2 or> 3 neighbors, it dies or, if it is dead and the lives of neighbors 3, it comes to life) to get a new state for the vector of cells from states . I googled found out that the order of actions in the line is as follows:
1) perform a bitwise shift of 3 bits left of all states in the oldstate vector
2) to obtain the result of the operation bitwise OR for c , which is the number of living neighbors - 1, and the result of paragraph 1)
3) mix the result of paragraph 2) according to the shufmask mask (it is extremely unclear where this mask comes from)
Maybe I misunderstood the meaning of the functions, but I cannot understand how the magic happens in this line and, in general, what happens to the lengths of numbers packed into vectors . Explain, please, to me stupid, preferably as detailed as possible, what, where and why.
The memory for states and counts is allocated as follows:
uint8_t *states = (uint8_t *)malloc((N + 2) * (N + 2) * sizeof(uint8_t)); uint8_t *counts = (uint8_t *)malloc((N + 2) * (N + 2) * sizeof(uint8_t)); Where N is the size of the matrix (World of the game) in one dimension.
PS source code https://github.com/lemire/SIMDgameoflife