How to make the following code work correctly

#include <stdio.h> #include <clocale> #include <ctype.h> int main(int argc, char** argv) { setlocale(LC_ALL, "Russian"); isupper('П'); return 0; } 

The problem is that in Release it falls, and in Debug it produces ASSERT: Expression: c> = -1 && c <= 255.

I have several solutions, but all do not fit in one degree or another:

  1. I cannot switch to UNICODE, because supported project with a large amount of such code.
  2. You can use the overloaded isupper function, which takes the second argument locale. I can not for the same reason - I do not want to rewrite calls everywhere.
  3. Surprisingly, the version works with isupper ((unsinged char) 'P'). That is, after all, RTL understands setlocale and works with Russian letters. (This is confirmed by the fact that if you remove setlocale, it will compile, but the isupper will return incorrect results.) But ASSERT does not understand this and it works regardless of locale, which is understandable, but bad.
  4. The option similar to (3) to put the compilation key / J (Make char as unsigned char) does not fit is incompatible with libraries, for example MFC.
  5. Writing your own isupper_rus functions and exchanging them with definitions is a bad way to rewrite RTL.

Question: Is it possible to solve the problem without significant rewriting of the code proposed above? So is it possible to make isupper work exactly?

  • I would suggest, as a long-term solution, to switch non-Unicode and wide strings. And then you start having problems with the output to the console too. Related question: Stackoverflow.com/q/459154/10105 - VladD
  • I would be happy, but, as I wrote in paragraph 1, the project is large and there is a lot of code. To rewrite it is a long time. Plus, it processes large amounts of textual data — to double these volumes — to lose performance, at least due to departure from caches. And one more trouble UNICODE - to write wchar_t everywhere, instead of the usual char, wstring instead of string is unpleasant. Although it is clear that it is correct. It is strange that this is not done at the compiler option level. Rather, done, but somehow not until the end. However, I understand why. - Damir
  • I understand this problem, you described it with paragraph 1. Therefore, I don’t offer it as an answer, but only as a outline for the future. (About 10 years ago, it seemed, it was customary to use TCHAR everywhere. It would have been easier with him.) - VladD

1 answer 1

The problem is that the function accepts an int, but assumes that the character codes are> 0, and the 'P' character is less than zero! To work with Cyrillic, we put the / J compiler key, which makes char by default unsigned (0..255). Once there was such an item in the settings, now you can simply add to the command line. Overproblems from this key, it seems, no.

By the way, instead of wchar_t (which still does not cover the entire Unicode!), You can use utf-8. In this case, the transfer of text between modules does not change at all, but the language processing varies greatly.