Incomprehensible type conversion when reading bytes into an array

Question

There was a problem reading the file. Correctly open the required file with the attributes "rb" through the standard function fopen. Reading occurs through:

char ch = getc(input_file);

In the environment through sizeof(char) check that this type occupies 8 bytes. It works incorrectly...

It turns out (using printf ) that from time to time (or, rather, stochastically) the next byte is read completely incorrectly. Below I give a "raw" binary dump of a file in the Little-Endian format, and next to it are the values that the corresponding function "spits out" to me, with offsets

 Смещение Исходное значение (LE) Показываемое значение (LE) 0x100 73 00 00 00 73 00 00 00 0x204 00 7D 00 00 00 7D 00 00 0x208 CE 03 00 00 FFFFFFCE 03 00 00 0x20C EB 09 00 00 FFFFFFEB 09 00 00 0x210 05 00 00 00 05 00 00 00 0x214 C7 3A CE 3F FFFFFFC7 3A FFFFFFCE 3F 0X318 CE 80 00 00 FFFFFFCE FFFFFF80 00 00 0X31C 12 01 00 00 12 01 00 00 0X320 F2 01 00 00 FFFFFFF2 01 00 00 0X324 05 00 00 00 05 00 00 00 0X328 C7 3A CE 3F FFFFFFC7 3A FFFFFFCE 3F

As far as I can tell, the system is as follows: if the number read ( char ch ) is less than 0x80, then it "remains" the same as it was, and if it is greater or equal, then in some way (which one, I am very interested) " finishes "up to 32 (bit depth x86?)" from the left "units ...

It is also interesting that I read the whole thing into a two-dimensional array of char arr[][] , and then pass and so on ... But why compile this way, if sizeof(char) == 1 ?!

In the end, what to do with the program? don't use the same mask every time when char ch >= 0x80 ? What will be more effective and correct, or how can you completely avoid this whole thing?

Thanks in advance for your answers!

Just in case:
Language: Clean C
IDE: C-Free Professional 5.0 + mingw5

Yes, and do not forget that printf as a variadic function extends its char parameters to int 'a.
Dkmayu, the problem is not in reading, but in the conclusion.
It is important that I then tried to transfer this array to the function and write the received bytes as 32-bit numbers, and I failed ... Even using shifts, I cannot get the correct values.
@ DumbStudent2016 sizeof (char) is always 1. What type are you talking about?
Give a minimal testable example that reproduces the problem.

Answer 1 · 2016-11-05T19:58:29

Try outputting this:

 printf("%x\n", (unsigned char)c);

Here is the code:

 char c = 0x81; printf("%x\n", c); printf("%x\n", (unsigned char)c);

displays

 ffffff81 81

( Check .)

The fact is that you have a char sign. The printf function, like any variadic function, extends its char arguments to int . Extending the char value with a coded high bit to an int results in a negative result — the sign bit "multiplies."

I think you should either use unsigned char , or type casting on output.

VladD

183k sixteen 223 432

I see, but everything, unfortunately, is not so simple; I then tried to transfer this array to the function and write the received bytes by quadruples as 32-bit numbers, and I failed ... Even using shifts, I cannot get the correct values: arr [...] [...] + (arr [...] [... + 1] << 8) + (arr [...] [... + 1] << 16) + (arr [...] [... + 1] << 24) also gives incorrect values - DumbStudent2016
@ DumbStudent2016: A bit shift also extends to int , if memory serves me. Try this: arr[...][...] | ((arr[...][... + 1] << 8) & 0xff00) | ((arr[...][... + 1] << 16) & 0xff0000) | ((arr[...][... + 1] << 24) & 0xff000000) arr[...][...] | ((arr[...][... + 1] << 8) & 0xff00) | ((arr[...][... + 1] << 16) & 0xff0000) | ((arr[...][... + 1] << 24) & 0xff000000) arr[...][...] | ((arr[...][... + 1] << 8) & 0xff00) | ((arr[...][... + 1] << 16) & 0xff0000) | ((arr[...][... + 1] << 24) & 0xff000000) . - VladD
Thank! After changing the type of the array from char to unsigned char, the code worked correctly! However, I still do not understand: 1. why with formatted output with "% x" the variable char has 4 bytes; 2. Why do inversions occur when units are inverted? For example, if char ch == C7, then FFFF.FFC7 is displayed; but since C7 = 1100.0111.b, then would it have to be output (since 0x100 - 0xC7 = 0x39) FFFFFF39? - DumbStudent2016
@ DumbStudent2016: (1) Because when you call functions of type printf , which are declared through ... , all the char or short argument there turn into an int . And int already comes to function. Why so - ask Kernighan and Richie, in other languages not so. - VladD
@ DumbStudent2016: (2) And here the C language has nothing to do with it, these are the features of storing hexadecimal numbers in an additional code . A negative number is stored in the same way as a positive number, which is obtained from it by adding 2^32 (or if you prefer, a modulo 2^32 deduction is stored). - VladD

|

Incomprehensible type conversion when reading bytes into an array

1 answer 1

More articles: