I met such a thing as byte string , and multibyte string .

How do they differ, and how are these two types defined? As I understood through the char type both, but what is the difference then it is not clear.

1 answer 1

Indeed, there are both single-byte and multibyte strings. And both of these types are arrays of char elements. But there is a difference.

In a single-byte string (byte string), each char is one character.

In a multibyte string (multibyte string), one character can occupy more than one byte, i.e. be located in several consecutive char .

For example, a UTF-8 encoded string should be saved as multibyte , since one character can take from 1 to 6 bytes.


A little historical background.

Initially, there were only encodings, where each character was encoded with no more than one byte. But since it was necessary to display characters in different languages, and 256 characters turned out to be small for this, encodings arose where the character was encoded with more than one byte. For example, this is Unicode . But Unicode is also different. You can always allocate a lot of space for a character (an array of wchar_t , for example), and you can save memory and make the number of bytes per character variables, which implements, for example, UTF-8 . For such encodings with a variable number of bytes per character, multibyte strings were created.

PS But no one bothers to use them for encodings with a constant number of bytes per character.