What are the rules for a lexicographic comparison of character arrays?

  • For those who asked the programmer. Or is it about strcmp() string functions? - PinkTux
  • If you are given an exhaustive answer, mark it as correct (a daw opposite the selected answer). - Nicolas Chabanovsky

2 answers 2

The codes stored in the characters are compared as unsigned integer values, that is, a normal sequential comparison of the codes occurs. Such a comparison is made in C using standard functions strcmp , which compares character arrays as strings, or using memcpy , which compares character arrays as a set of bytes.

If character arrays are compared as strings, then if all common characters of both strings match, then the smaller string is one that is shorter.

Here is a C demo.

 #include <stdio.h> #include <string.h> int main(void) { char A[] = "A"; char a[] = "a"; char B[] = "B"; char AA[] = "AA"; printf( "%s\n", strcmp( A, B ) < 0 ? "true" : "false" ); printf( "%s\n", strcmp( A, a ) < 0 ? "true" : "false" ); printf( "%s\n", strcmp( B, a ) < 0 ? "true" : "false" ); printf( "%s\n", strcmp( A, AA ) < 0 ? "true" : "false" ); return 0; } 

Its output to the console

 true true true true 

And a similar C ++ program

 #include <iostream> #include <iomanip> #include <cstring> int main() { char A[] = "A"; char a[] = "a"; char B[] = "B"; char AA[] = "AA"; std::cout << std::boolalpha << ( std::strcmp( A, B ) < 0 ) << std::endl; std::cout << std::boolalpha << ( std::strcmp( A, a ) < 0 ) << std::endl; std::cout << std::boolalpha << ( std::strcmp( B, a ) < 0 ) << std::endl; std::cout << std::boolalpha << ( std::strcmp( A, AA ) < 0 ) << std::endl; return 0; } 

Its output to the console

 true true true true 

that is, for example, 'A' less than 'a' , since the code 'A' less than the code 'a' in the ASCII. (например, в EBCDIC code table ASCII. (например, в EBCDIC ASCII. (например, в EBCDIC will have the opposite result), and the string "A" is less than the string "AA", since the first line is shorter than the second line, although the codes of all its characters (in this case, one character) coincide with the corresponding codes of the second line.

  • In EBCDIC encoding is "A"> "a", therefore output in a system with EBCDIC encoding. true false false true - Yaroslav
  • @Yaroslav Good mixing, I forgot something. :) - Vlad from Moscow
  • The codes stored in characters are compared as unsigned integer values - everything is not so simple! Think, for example, about the unsigned value of the code of the code of the letter "e" in different encodings. - Sergey
  • @Sergey What do you want to say? - Vlad from Moscow
  • I want to say that for coding Cyrillic is used (EMNIP) five (!!!) different encodings (Wed866b Wed1251, KOI-8R, DKOI, ISO8859-5, UTF ...). And not only that these encodings are all different in code values, so also Cyrillic letters in them go in a DIFFERENT order. - Sergey

There is no comparison operation for arrays in the C language, regardless of their type. Therefore, as a programmer wants, so he will compare them. For example, in length :)

If we are talking about the functions of the standard library, working with strings , then their behavior is described for each group explicitly.

For example, strncmp () says:

This is not a locale that has been compared ...

Description strcoll () says:

Compares the two strings of the null-terminated byte strings according to the LC_COLLATE category ... . It can be used for diacritics. Single collation units. For example, "ch" in Czech follows "h" and precedes "i", and "dzs" in Hungarian follows "dz" and precedes "g".

The situation is similar with the wcsncmp() / wcscoll() pair, etc. In general, read the documentation, everything is detailed there.

  • I just want to clarify that the documentation is not a man for a specific function, but the POSIX standard IEEE Std 1003.1-2008, 2016 Edition, Chapter 7. That's where everything is painted absolutely exactly. - Sergey
  • You are right, thanks, having understood, that I meant the function strcmp () - Nikita Gusev