I did some experiments, but I couldn't explain why

Example 1:

#include <stdio.h> int main() { /// латиница printf("\n%d",sizeof('a')); /// 1 /// кириллица printf("\n%d",sizeof('ф')); /// 4 return 0; } 

Example 2:

 #include <stdio.h> int main() { char a = 'a'; /// латиница char b = 'ф'; /// кириллица printf("\n%d",sizeof(a)); /// 1 printf("\n%d",sizeof(b)); /// 1 return 0; } 

I do not understand anything, explain! PS g ++ compiler

  • In the first case, you define the size of the character, which may take more than one byte (all sorts of UniCode). The second size of the char field, which is always 1 byte - Alexander Muksimov
  • @MaximPro Show a minimal compiled example that demonstrates this result. - Vlad from Moscow
  • @VladfromMoscow repl.it/EuJe/0 please - MaximPro
  • @MaximPro Do not specify links, but provide the minimum compiled program that you yourself started, and which demonstrates the problem. - Vlad from Moscow
  • 2
    sizeof is not a function, but an operator. Yes, and who taught you the string to start with '\ n'? - 0andriy

3 answers 3

Character (and string) literals in program code are somehow translated by the compiler into a sequence of bytes. The rule of this transformation depends on the encoding of the source code (as other participants have already mentioned), but may also depend on a number of factors (see the answer to another question).

It should be noted that both versions of your code when trying to compile with the clang compiler lead to an error :

 error: character too large for enclosing character literal type char b = 'ф'; /// кириллица 

And the gcc you use gives you a couple of warnings for the line with the letter:

 warning: multi-character character constant [-Wmultichar] warning: overflow in implicit constant conversion [-Woverflow] 

The first one talks about using a multi-character literal (which is not supported by all compilers). The second is that this literal does not fit into char . Those. the 'ф' type was interpreted by the compiler as something more than char , and, as already mentioned in the quotation from the @Harry answer, this type is int :

... has a type int , and has an implementation-defined value.

Based on the foregoing, we can conclude:

  • 4 (example 1) is obtained because the multi-character literal is not truncated and its size is equal to the size of int , i.e. sizeof(int) == 4 .
  • 1 (example 2) is obtained because the multi-character literal was truncated to type char when the variable b initialized, and sizeof(char) == 1 by definition.

I will reply here and to your comment on another post:

I imagined a multibyte literal as one character from a complex encoding, let's say 'f' (UTF-8) and this is 2 bytes, we can write no more than 2 'f' by your words ... inconsistent with 4 characters

Record more than 2 'ф' you really will not work .

 #include <stdio.h> int main() { printf("\n%d",sizeof('ффф')); } 
 warning: character constant too long for its type 

Those. in fact, truncation of the value to sizeof(int) occurs.

But if a string literal is used, then two bytes can be enough for storing ф :

 #include <stdio.h> int main() { const char c[] = "ф"; printf("%ld\n", sizeof(c)); } 

3

Displays the number 3 , because 1 byte is allocated for terminating zero.

  • The latter translates as: "Warning: the character constant is too big for this type"? And what does truncation mean, that is, only 2 characters will be preserved? And if so, how to print these 2 characters? What is the practical meaning? - MaximPro
  • And it is not clear by what principle the result is displayed? printf("%u - %u\n%u - %u", sizeof('2'), '2', sizeof('22'), '22'); /// 1 - 50 4 - 12850 printf("%u - %u\n%u - %u", sizeof('2'), '2', sizeof('22'), '22'); /// 1 - 50 4 - 12850 - MaximPro
  • @MaximPro Truncate, this is when you, for example, are trying to assign the number 1000 to a variable of type char . How it really gets truncated depends on the compiler, but most likely it just discards the first characters . You can print any area of ​​memory by bytes, via reinterpret_cast . - αλεχολυτ
  • @MaximPro 12850 = 50 * 256 + 50 - αλεχολυτ
  • You did not fully answer my first question to your answer. How did that make this formula? I understand that 1 byte is 256 values, but I do not understand the actions performed in this formula! - MaximPro

sizeof is not a formal function, but an operator that gives the type size at compile time — i.e. how much memory is required to store a variable of this type.

Judging by the fact that for the Latin literal 'a' you got the value 1, you compile not as C, but as C ++! In pure C, the character literal is cast to an int , and you would get 4.

You also have to assume that your source file is saved as utf-8, for example, so that the Cyrillic script expands into something more than char , and this something is reduced to int and gives 4.

The last lines get not a literal, but a variable of type char , and for it in C ++ it is unambiguous - sizeof(char) == 1 .

That's all...

PS Because some insist ... :) From the standard about character literals:

It is a characterization of the c-char in the case of the individual character. C-char is a multicharacter literal. It is not a condition that the number of literal characters can be set.

  • Comments are not intended for extended discussion; conversation moved to chat . - PashaPash

The type of single-byte character literal 'q' is char , sizeof(char) is 1.

The type of multibyte character literal 'qq' is int , sizeof(int) is for example 4.

If the source code is stored in the utf-8 encoding, then the literal 'ф' is two bytes, it is equivalent to '\xd1\x84' . Accordingly, its type is int .

  • qq - this is already 2 characters, this is already some kind of string literal - MaximPro
  • '\xd1\x84' way, I think you made a mistake with the hexadecimal code not '\xd1\x84' but '\xd4\x84' - MaximPro
  • @MaximPro f . A string literal is given in double quotes. - αλεχολυτ
  • @alexolut and how does a multibyte character literal differ from a string literal? (Except for quotation marks difference) - MaximPro
  • @MaximPro string can contain any number of characters, multibyte character - no more than 4x. - αλεχολυτ