You can find out the number of characters in a line like this:

TCHAR* someStr = "Hello World!"; size_t sizeStr = strlen(someStr); 

How to find out the number of characters in the byte array, for example:

 BYTE * someByte = "\xFF\xAA\x55"; size_t sizeByte = strlen((TCHAR*)someByte ); 

It does not work correctly, especially with Unicode. How to be?

4 answers 4

For different types of strings you should use different functions.

The description can be found in the documentation on MSDN.

  • when the byte array "\ x55 \ x00 \ xFF" goes and the _mbslen function is called - then it will return "1", and not 3, why does it terminate the count if there is a zero byte? function thinks that this is the end or what? - Duracell
  • @Duracell a zero byte in a multibyte string is a sign of the end, so 1 returned, i.e. only \x55 character \x55 . - αλεχολυτ
  • I understand that if in my byte array there are zero bytes, correctly calculating the number of characters will not work? - Duracell
  • @Duracell all of these character counting functions are based on the presence of a special terminating symbol. If such a symbol is found, then the function of counting the length and stop. Maybe you just need to abandon null-terminated strings and use character arrays? - αλεχολυτ
  • No, I need to calculate the characters in the BYTE array - Duracell

Unicode can be different:

multibyte (utf-8)

The standard ASCII-Z approach and the class std :: string will do for it. But at the same time, it will be considered not the length of the string in characters, but the number of bytes.

Working at the character level with the utf-8 C ++ standard is not supported - external libraries are needed.

double byte (utf-16)

Here the wchar_t * type and the std :: wstring class will help, the rest, by analogy with ASCII-Z, only the word (16 bits) equal to 0 will be a sign of the end of the string.

For these lines, there are analogues of eight-bit functions from the standard C library, for measuring length, wcslen() , for example

At the same time, the work of these analogues will occur just at the character level.

four byte (utf-32)

C ++ standard library is not supported.

  • one
    In the case of utf-8, you will get the number of bytes, but not the number of characters, no? - Harry
  • @Harry - made a clarification, thanks. - gbg
  • Something I can not find in the wstrlen . Link please? - αλεχολυτ
  • @alexolut, I made the edit - gbg
  • Isn't wchar_t 32-bit under standard Linux? (Not sure.) - VladD
 wchar_t* wstr1 = L"Count."; wprintf(L"Length of '%s' : %d\n", wstr1, wcslen(wstr1) ); 
  • Please try to write more detailed answers. I am sure the author of the question would be grateful for your expert commentary on the code above. - Nicolas Chabanovsky

You ask this question because you do not understand basic things.

First you need to understand what a regular line is. what a regular line looks like

You see the pointer = 4 bytes, which actually indicate an array of data.

strlen () - works like this, it runs through the array and searches for \ 0 (end of line), i.e. strlen () is simply a function of counting elements.

Further more interesting ..

If you are talking about unicode. So his structure is completely different.

Usually an auxiliary class is used, such as CString. Its structure is approximately as follows.

something like this

  • char (for ANSI character strings).

  • wchar_t (for character strings in unicode).

  • TCHAR (for ANSI and Unicode character strings).

Those. see what? These are completely different structures. And they have a different function. For wchar_t (unicode), use wslen () like so what is called ..

By the way, the compiler also has a directive in the settings. What type will be the default ansi or unicode. I advise you to put unicode.

About your example.

It was originally made crooked. Most likely you either get a string from the socket, or read from the file.

Read into the buffer, and immediately into a normal structure, some kind of CString.

  • 3
    Minus my, for crony style magazine "Hacker." And for the advice to use CString from the late MFC, which is scary to remember. - gbg
  • one
    First, MFC is not deceased. His only worthy alternative to the WTL. However, there and there are advantages and disadvantages. Secondly, CString was given as an example (even specially marked) - the main thing is not a specific example, but to capture the meaning .. - hitcode
  • @hitcode, I know the base, I know that Unicode takes 2 bytes, I just didn’t finish reading msdn + there is another nuance when the byte array "\ x55 \ x00 \ xFF" goes and call the _mbslen function - it will return "1", not 3 , now I sit understand why she thinks so strangely - Duracell
  • @Duracell, Because it shows you how much your 1 pointer takes = 1 character. And you need to make a link to the string .. We look at the first picture. You draw a small square, and you need the whole line. - hitcode