Found out that CompareString (what is ansi, what is unicode) orders, containing AND or X, is peculiar. In this case, you can watch it simply in Explorer, creating files and sorting them. Example: “Yemen” falls before “Jordan”, while “my” before “mine” (but if you change “Jordan” to “Jardania”, it jumps to). I tried to remember whether there are any special rules for organizing words with these two letters and could not remember (and also find them on the network). My expectations that the "nd" should be after "and" always, but here is something incomprehensible.

Bug or not bug?

  • And it's buggy for me, just differently (at least in cp1251 (it’s ANSI)) c: / Users / avp $ cat tt.txt yukuken itsuken fbba abbb c: / Users / avp $ sort tt.txt fbb abbb itsuken ytsuken c: / Users / avp $ c: /UnixUtils/usr/local/wbin/sort.exe tt.txt abb i tsuken yuken fbba c: / Users / avp $ Do not be surprised, this is the 7th Windows, just commands from UnixUtils to emacs eshell. Bill Gates ... what else to expect from him. (I'll try to sort it with a simple program through strcmp ()). - avp
  • Well, it was me who used to be late in time. I tried, qsort () with strcmp () sorts as well as UnixUtils. I looked closely - that's right, because it really should be the last. - So go to * nix (or transfer their software to Windows). - avp
  • @avp, of course, too much confirmation. Let's think - maksee
  • Checked in XP, there is no such thing, so it looks like a bug of later versions of Windows - maksee

1 answer 1

As far as I understand, the “crossword principle” is used in sorting: the letters I and Y are considered to be the same letter (unless the words are completely identical).

An example of sorting in the "Explorer" on Windows 7:

alt text

Documentation reports:

... two strings of that can be linguistically equivalent

Try translating strings to Unicode (in any case, single-byte strings are evil, so I would refuse them altogether, especially if you work with localizations) and compare them in LOCALE_INVARIANT .

  • @VladD, the Unicode link seems to refer to the fact that you suspect that Unicode is more properly resolved, I understand correctly? But those tests that I did for CompareStringW also conducted, the same result, besides the conductor for a long time as far as I understand it is unicode. I am ready to accept, if it is stated somewhere that the “crossword principle” is correct and accepted by someone and somewhere, and not that this is some kind of accident. Anyway, thanks for the detailed answer - maksee
  • @maksee: CompareString has flags, have you tried LOCAL_INVARIANT ? Unicode is needed, for example, in order for you to be able to compare without taking into account language features. - VladD
  • In the Unicode table, the codes И , Й 0x418, 0x419, и , и , и 0x438, 0x439. So, sorting out where Yemen used to be before Yokagama is clearly not on Unicode codes. For some reason, in M ​​$ it was decided that in Russian k и c should be treated the same way as letters with tildes (and so on umlauts) in Western European languages. - avp
  • one
    @avp: Yeah. Localization has always been a sore spot in all programming . With the localization of European languages, Microsoft basically managed, Japanese, too, with a creak overcame for many customers, but the rest is not very much somehow. Well, okay, even 15 years later, maybe they will prevail. - VladD
  • @avp, I commented to the main text, but just in case I'll duplicate it - everything is fine in XP, it looks like a bug appeared later apparently in the process of some alterations - maksee