unsigned short A=B[0]+B[1]*255;
Essentially reduced to B [0] + B [1] << 8; Those. the main one is two operations: addition + shift. And with multiplication made a mistake - you need to multiply by 0x100, i.e. 256. If variable A is badly needed, add forwarding to memory. In fact, most likely, the variable A is optimized and the value will be taken from the register of the processor.
unsigned short A; CopyMemory (&A,B,2);
Call f-tsii. Total - organization of the stack (push / pop, setting registers, passing parameters), the code of the function itself. If the built-in function is already better, but the speed will still be worse than addition + shift, since there is work with memory. Optimize fails.
uint16_t A = *(uint16_t*)&B[0];
and clones. Only two shipments (memory -> register, register -> memory). At best, it is optimized and the value of A will then again be taken from the register. Those. in fact - one reading from memory. And no memory entries.
In general, in fact, it is necessary to take and watch assembly listing. Now all compilers are optimizing. And it is simply rough to convert unequivocally one instruction of a language into one or several instructions of a processor. A will be the most effective option for one of the criteria. There are actually two of them: speed and size. And for each processor, the optimization rules are different.
Regarding double:
double A = *(double*)&B[0];
But with double I would be careful. The fact is that integers are stored as integers, bitwise. Each byte is consecutive. And a record
unsigned short A=B[0]+B[1]*256; unsigned long C=B[0]+B[1]<<8+B[2]<<16+B[3]<<24;
works. And the internal representation of a double is much more complicated. Mantissa, exhibitor, signs ... Fu. br. And simply tearing out certain double digits is more difficult.