On the issue of shifts, signs and speed MK

“Find a reason for everything and you will understand a lot”

Perhaps my regular readers (well, it can not be that they were not) remember that I somehow in my post was perplexed about the fact that when describing the registers of external devices use the unsigned attribute. It was suggested in the comments that this was done in order to avoid unspecific behavior during shifts and I agreed. As I recently discovered, there is another reason for this use of an attribute, and it can be applied not only to registers, but also to ordinary variables.

So we begin.

To start a small introduction to iron

As a target platform, we will consider an 8-bit MK without a battery (this is such a pathetic attempt to hide the compromised name of the AVR), which has the following hardware-implemented commands:

lsl / lsr logical left / right shift, low / high bit cleared;
rol / ror cyclic shift left / right through the transfer (shift 9 bits);
asr arithmetic shift to the right, the most significant (sign) bit is preserved (note that it is generally impossible to perform this type of shift to the left in general).

All these commands are executed on the byte operand and are the basis for the implementation of all other possible shifts. For example, a word shift (2 bytes rh, rl) with a sign right by 1 bit is implemented by the following sequence:

asr rh; ror rl;

Consider a simple code example and the corresponding assembly code for the MC with the AVR command system, as always, received on godbolt.org. (it is assumed that optimization is enabled and the variable is located in the r24 register)

int8_t byte; byte = byte << 1;

 clr r25 sbrc r24,7 com r25 lsl r24 rol r25

and see that the operation takes five teams?

Note: If someone in the comments tells how to arrange this fragment (and subsequent) in 2 columns, I would be grateful.

From the assembler code, it is clear that the byte variable expands to an integer (16-bit) type in the first three commands, and in the next two, the two-byte number itself is shifted — somehow strange, to say the least.

Right shift is no better

 byte = byte >> 1; clr r25 sbrc r24,7 com r25 asr r25 ror r24

- the same five teams. Meanwhile, it is obvious that in fact, to perform the last operation, you need one single command.

 аsr r24

and for the first operation no more. I have repeatedly stated that at present the compiler creates assembly code no worse than a programmer (although it was about the ARM command system), especially if it was a little help, and suddenly such a bummer. But let's try to help the compiler to create the correct code, it may be the case of mixing the types in the shift operation and try

 byte = byte >> (int8_t) 1;

- did not help, from the word "absolutely", but option

  byte=(uint8_t) byte >> 1;

gives a slightly better result

 ldi r25,lo8(0) asr r25 ror r24

- three teams, since the expansion to the whole now takes one team - is already better, although not perfect, the same picture for

 byte=(uint8_t) byte << 1;

- three teams. Okay, so as not to write unnecessary casts, we make the variable itself unsigned

 uint8_t byteu;

and BINGO - assembly code is fully consistent with our expectations

 byteu = byteu << 1; lsr r24

It is strange how it would seem, what a difference, to specify the right type of variable immediately, or bring it directly into the operation - but it turns out there is a difference.

Further studies have shown that the assembler code takes into account the type of the variable that is assigned the result, because

 byteu = byte << 1;

works fine and produces minimal code, and the option

 byte = byteu << 1;

can not do without three teams.

Surely this behavior is described in the standard of the language, I ask those who know in the commentary, but once again I will proudly declare that the “Chukchi is not a reader” and will continue the narration.

So, this method did not help the right shift - as before, there were 3 teams (well, which is not 5, as for the sign option) and I could not improve the result by any means.
But in any case, we see that shift operations with an unsigned number are carried out faster than with his opponent. Therefore, if we are not going to treat the most significant bit of a number as a sign (and in the case of registers, this is the case, as a rule), then we should certainly add the unsigned attribute, which we will do in the future.

It turns out that in general everything is extremely interesting with shifts, we will begin to increase the number of positions when shifting to the left and look at the results: << 1 takes 1 tick, << 2 - 2, << 3 - 3, 4 - 2 unexpectedly, the compiler has applied a clever optimization

 swap r24 andi r24,lo8(-16)

where the s wap command swaps two nibbles per byte. Further, on the basis of the last optimization << 5 - 3, << 6 - 4, << 7 - 3 again unexpectedly, there is another optimization

 ror r24 clr r24 ror r24

the carry bit is used, << 8 - 0 clock cycles, since it just turns out to be 0, it makes no sense to look further.

By the way, here you have an interesting task - in what minimum time can you perform an operation?

 uint16_t byteu; byteu = byteu << 4;

which translates 0x1234 to 0x2340. The obvious solution is to run a couple of commands 4 times.

 lsl rl rol rh

leads to 4 * 2 = 8 clocks, I quickly came up with an option

 swap rl ; 1243 swap rh ; 2143 andi rh,0xf0 ; 2043 mov tmp,rl andi tmp,0x0f or rh,tmp ; 2343 andi rl,0xf0 ; 2340

which requires 7 clock cycles and an intermediate register. So, the compiler generates a code of 6 commands and no intermediate registers - cool, yes.

I hide this code under the spoiler - try to find a solution yourself.

Hint: in the MK command set, there is an EXCLUSIVE OR or SUM according to the MODULE TWO eor command

Here it is, this wonderful code

 swap rl ; 1243 swap rh ; 2143 andi rh,0xf0 ; 2043 eor rh,rl ; 6343 andi r2l,0xf0 ; 6340 eor rh,rl ; 2340

Just get aesthetic pleasure from this piece.

What is characteristic, for 16-bit numbers, the difference between the code for a signed and unsigned number is gone when shifting to the left, strangely like that.

Let's go back to our bytes and start moving to the right. As we remember, for the sign byte we have 5 cycles, for the unsigned byte - 3 and this time cannot be reduced. Or all the same it is possible - yes, it is possible, but in a very strange way (GCC with included optimizations - “this is a very, very strange place”), namely

 byteu = (byteu >> 1) & 0x7F;

which spawns exactly one command for both sign options. Fits and option

  byteu = (byteu & 0xFE) >> 1;

but only for the unsigned number, with the significant one, everything becomes even more dismal - 7 cycles, so we continue to explore only the first option.

I can not say that I understand what is happening, because it is obvious that logical multiplication (&) by such a constant after such a shift does not make any sense (and is not carried out), but the code of the shift itself is affected by the & operation. "You see a gopher - no - and I do not see, but he is."

Shifts by 2 and so on have shown that it is important to pay off the sign bit in the result, but the number is initially signless, in general, some kind of garbage turns out, “but it works the same” - the only thing that can be said about this.

Nevertheless, it is safe to say that the interpretation of the contents of the registers and memory as unsigned numbers allows for a number of operations (for example, shifting or expanding the value) with them to be performed faster and generates a more compact code, so it can be highly recommended when writing programs for MK, if otherwise (interpretation as a number with a sign) is not a prerequisite.

Source: https://habr.com/ru/post/436956/

On the issue of shifts, signs and speed MK

“Find a reason for everything and you will understand a lot”

More articles: