Find out the number of characters in the BYTE array

Question

You can find out the number of characters in a line like this:

TCHAR* someStr = "Hello World!"; size_t sizeStr = strlen(someStr);

How to find out the number of characters in the byte array, for example:

 BYTE * someByte = "\xFF\xAA\x55"; size_t sizeByte = strlen((TCHAR*)someByte );

It does not work correctly, especially with Unicode. How to be?

@alexolut, thanks for helping me, put your answer in "in reply", I will tick it to close the topic-)

αλεχολυτ αλεχολυτ 21.1k 9 39 92 · Accepted Answer · 2016-07-02T07:26:10

For different types of strings you should use different functions.

The description can be found in the documentation on MSDN.

αλεχολυτ

21.1k 9 39 92

when the byte array "\ x55 \ x00 \ xFF" goes and the _mbslen function is called - then it will return "1", and not 3, why does it terminate the count if there is a zero byte? function thinks that this is the end or what? - Duracell
@Duracell a zero byte in a multibyte string is a sign of the end, so 1 returned, i.e. only \x55 character \x55 . - αλεχολυτ
I understand that if in my byte array there are zero bytes, correctly calculating the number of characters will not work? - Duracell
@Duracell all of these character counting functions are based on the presence of a special terminating symbol. If such a symbol is found, then the function of counting the length and stop. Maybe you just need to abandon null-terminated strings and use character arrays? - αλεχολυτ
No, I need to calculate the characters in the BYTE array - Duracell

|

Answer 2 · 2016-07-02T07:00:10

Unicode can be different:

multibyte (utf-8)

The standard ASCII-Z approach and the class std :: string will do for it. But at the same time, it will be considered not the length of the string in characters, but the number of bytes.

Working at the character level with the utf-8 C ++ standard is not supported - external libraries are needed.

double byte (utf-16)

Here the wchar_t * type and the std :: wstring class will help, the rest, by analogy with ASCII-Z, only the word (16 bits) equal to 0 will be a sign of the end of the string.

For these lines, there are analogues of eight-bit functions from the standard C library, for measuring length, wcslen() , for example

At the same time, the work of these analogues will occur just at the character level.

four byte (utf-32)

C ++ standard library is not supported.

In the case of utf-8, you will get the number of bytes, but not the number of characters, no?

Nicolas Chabanovsky ♦ 38.2k 54 220 437 · Answer 3 · 2016-07-02T07:52:41

 wchar_t* wstr1 = L"Count."; wprintf(L"Length of '%s' : %d\n", wstr1, wcslen(wstr1) );

Nicolas Chabanovsky ♦

38.2k 54 220 437

akula

190 eight

Please try to write more detailed answers. I am sure the author of the question would be grateful for your expert commentary on the code above. - Nicolas Chabanovsky ♦

|

hitcode hitcode 347 one 20 · Answer 4 · 2016-07-02T07:29:34

You ask this question because you do not understand basic things.

First you need to understand what a regular line is.

You see the pointer = 4 bytes, which actually indicate an array of data.

strlen () - works like this, it runs through the array and searches for \ 0 (end of line), i.e. strlen () is simply a function of counting elements.

Further more interesting ..

If you are talking about unicode. So his structure is completely different.

Usually an auxiliary class is used, such as CString. Its structure is approximately as follows.

char (for ANSI character strings).
wchar_t (for character strings in unicode).
TCHAR (for ANSI and Unicode character strings).

Those. see what? These are completely different structures. And they have a different function. For wchar_t (unicode), use wslen () like so what is called ..

By the way, the compiler also has a directive in the settings. What type will be the default ansi or unicode. I advise you to put unicode.

About your example.

It was originally made crooked. Most likely you either get a string from the socket, or read from the file.

Read into the buffer, and immediately into a normal structure, some kind of CString.

And for the advice to use CString from the late MFC, which is scary to remember.
Secondly, CString was given as an example (even specially marked) - the main thing is not a specific example, but to capture the meaning ..
@hitcode, I know the base, I know that Unicode takes 2 bytes, I just didn’t finish reading msdn + there is another nuance when the byte array "\ x55 \ x00 \ xFF" goes and call the _mbslen function - it will return "1", not 3 , now I sit understand why she thinks so strangely
@Duracell, Because it shows you how much your 1 pointer takes = 1 character.
And you need to make a link to the string .. We look at the first picture.

Find out the number of characters in the BYTE array

4 answers 4

multibyte (utf-8)

double byte (utf-16)

four byte (utf-32)

More articles: