Russian language in the console

Question

I teach C ++ according to the Stroustrup book, Russian characters are not displayed. Here is the code:

#include <iostream> #include <string> using namespace std; int main() { setlocale(LC_ALL, "Russian"); string previous = " "; string current; while (cin >> current) { if (previous == current) { cout << "Повторяющееся слово: " << current << endl; } previous = current; } cin.get(); return 0; }

"Duplicate word:" - displayed normally due to setlocale. The fact that after - kryakozyably, although the repeating word finds. setlocale tried different (0, ""), "", "Rus", etc.

In Code :: Blocks, everything works without kryakozyablov. Even without setlocale.

Alas, but setlocale will not save you, the only normal option is to change the encoding in the console
@VladD, found an example with CharToOem, quite suitable ru.stackoverflow.com/questions/70089/…
@VladD, found - ru.stackoverflow.com/a/434186/177221 is a working version just for Visual Studio.
There are minuses there for the last UPD author and besides your charToOem and ConsoleCP are there too.
Apparently you did not even bother to normally view the thrown branch, alas.

Answer 1 · 2015-10-18T11:51:13

For this problem there are many solutions. If you need a quick and not necessarily a universal solution, so as not to understand much, scroll to the “ Less correct, but suitable solutions ” section.

Correct but difficult decision

To begin with, the problem with the Windows console is that its fonts, which are “default”, do not show all the characters. You should change the console font to unicode, it will even work on English Windows. If you want to change the font only for your program, in its console, click on the icon in the upper left corner → Properties → Font. If you want to change for all future programs, the same thing, just go to the Default, not Properties.

Lucida Console and Consolas cope with everything except hieroglyphs. If your console fonts allow, you can output and 猫 , if not, only those characters that are supported.

Further consideration concerns only Microsoft Visual Studio. If you have a different compiler, use those suggested at your own risk, there is no guarantee.

Now, the encoding of the compiler input files. The Microsoft Visual Studio compiler (at least versions 2012 and 2013) compiles the sources in single - byte encodings as if they were actually in ANSI-encoding, that is, for the case of the Russian system, CP1251. This means that the source encoding in CP866 is incorrect. (This is important if you use the L"..." strings.) On the other hand, if you store the sources in CP1251, these same sources will not normally compile on non-Russian Windows. Therefore it is worth storing the source code in Unicode (for example, UTF-8).

After setting up the environment, let's move on to solving the actual problem.

The correct solution is to get away from single-byte encodings, and use Unicode in the program. At the same time, you will get the correct output not only of the Cyrillic alphabet, but also the support of all languages (there will be no image of characters missing in the fonts, but you will be able to work with them). For Windows, this means moving from narrow strings ( char* , std::string ) to wide ( wchar_t* , std::wstring ), and using UTF-16 encoding for strings.

(Another problem solved by the use of wide strings: narrow strings are encoded into single-byte encoding using the current system code page, that is, ANSI encoding. If you compile your program on English Windows, this will lead to obvious problems.)

You need _setmode(_fileno(...), _O_U16TEXT); to switch console mode:

 #include <iostream> #include <io.h> #include <fcntl.h> int wmain(int argc, wchar_t* argv[]) { _setmode(_fileno(stdout), _O_U16TEXT); _setmode(_fileno(stdin), _O_U16TEXT); _setmode(_fileno(stderr), _O_U16TEXT); std::wcout << L"Unicode -- English -- Русский -- Ελληνικά -- Español." << std::endl; // или wprintf(L"%s", L"Unicode -- English -- Русский -- Ελληνικά -- Español.\n"); return 0; }

This method should work correctly with input and output, with file names and stream redirection.

Important note: I / O streams are either in the “wide” or “narrow” state — that is, either only char* or wchar_t* is displayed in them. After the first output, switching is not always possible. Therefore, this code:

 cout << 5; // или printf("%d", 5); wcout << L"привет"; // или wprintf(L"%s", L"привет");

may well not work. Use only wprintf / wcout .

If you really do not want to go to Unicode, and use a single-byte encoding, problems will arise. To begin with, characters that are not included in the selected encoding (for example, for the case of CP1251 - basic English and Cyrillic) will not work; gibberish will be entered and displayed instead. In addition, narrow string constants are ANSI-encoded, which means that Cyrillic string literals on a non-Russian system will not work (they will have abracadabra dependent on the system locale). Keeping in mind these problems, we proceed to the presentation of the next series of solutions.

Less correct, but suitable solutions

In any case, put the unicode font in the console. (This is the first paragraph of the "complex" solution.)

Make sure that your sources are encoded in CP 1251 (this does not go without saying, especially if you are not in the Russian Windows locale). If when adding Russian letters and saving, Visual Studio swears that it cannot save characters in the correct encoding, select CP 1251.

(1) If your computer is yours, you can change the code page of console programs on your system. To do this, do this:

Run Regedit.
For every fireman, export the registry somewhere (for some reason everyone skips this step, so that when everything breaks down, we warned you).
In the HKEY_CURRENT_USER\Console section, find the CodePage key (if not, create a key with the same name and DWORD type).
Set the value by key (left key / change / number system = decimal) to 1251.
Do not forget to reboot after changes in the registry.

Advantages of the method: examples from the books will start working out of the box. Disadvantages: changing the registry can cause problems, the console encoding is changing globally and permanently - it can affect other programs to break. Plus, the effect will be only on your computer (and on others who have the same encoding of the console). Plus common problems of non-unicode methods.

Note. Installing the global console code page through the registry HKEY_CURRENT_USER\Console\CodePage does not work in Windows 10, the OEM code page will be used instead - presumably a bug in conhost . At the same time, installation of the console code page at the application-specific level ( HKEY_CURRENT_USER\Console\(путь к приложению)\CodePage ) works.

(2) You can change the encoding of your program only. To do this, you need to change the console encoding programmatically. Out of politeness to other programs, do not forget to return the encoding to the place!

This is done either by calling functions.

 SetConsoleCP(1251); SetConsoleOutputCP(1251);

at the beginning of the program, or by calling an external utility

 system("chcp 1251");

(I mean, you should have something like

 #include <cstdlib> int main(int argc, char* argv[]) { std::system("chcp 1251"); ...

or

 #include <Windows.h> int main(int argc, char* argv[]) { SetConsoleCP(1251); SetConsoleOutputCP(1251); ...

and further ordinary program code.)

You can wrap these calls in a class to take advantage of the automatic control of the lifetime of C ++ objects.

Example:

 #include <iostream> #include <string> int chcp(unsigned codepage) { // составить команду из кусочков std::string command("chcp "); command += codepage; // выполняем команду и возвращаем результат return !std::system(command.c_str()); } // этот код будет запущен перед main static int codepage_is_set = chcp(1251);

(if you are performing a task from Stroustrup, you can insert it at the end of the std_lib_facilities.h header file)

Or so:

 #include <windows.h> class ConsoleCP { int oldin; int oldout; public: ConsoleCP(int cp) { oldin = GetConsoleCP(); oldout = GetConsoleOutputCP(); SetConsoleCP(cp); SetConsoleOutputCP(cp); } // поскольку мы изменили свойства внешнего объекта — консоли, нам нужно // вернуть всё как было (если программа вылетит, пользователю не повезло) ~ConsoleCP() { SetConsoleCP(oldin); SetConsoleOutputCP(oldout); } }; // и в программе: int main(int argc, char* argv[]) { ConsoleCP cp(1251); std::cout << "русский текст" << std::endl; return 0; }

If you need not some Russian, but some other language, just replace 1251 with the identifier of the desired encoding (the list is listed below in the file), but, of course, operation is not guaranteed.

There are methods that are also often found, we give them for completeness.

Methods that work poorly (but can help you)

A method that is often recommended is to use the setlocale(LC_ALL, "Russian"); construct setlocale(LC_ALL, "Russian"); This option (at least in Visual Studio 2012) has a lot of problems. First, the problem with entering the Russian text: the entered text is transferred to the program incorrectly! Non-Russian text (for example, Greek) is not entered from the console at all. Well, common to all non-unicode solutions.

Another method that does not use Unicode is the use of the CharToOem and OemToChar . This method requires recoding each of the lines in the output, and (it seems) weakly amenable to automation. He also suffers from common non-Unicode solutions. In addition, this method will not work (not only with constants, but also with runtime strings!) On non-Russian Windows, since there the OEM encoding will not be the same as CP866. In addition, you can also say that these functions are not supplied with all versions of Visual Studio - for example, in some versions of VS Express, they simply do not exist.

Sources:

How to display and enter data of type wchar_t []?
- unfortunately, the author of that issue used the MinGW compiler under Cygwin and WinXP, which makes most modern solutions inapplicable.
Output unicode strings in Windows console app
Conventional wisdom is retarded, aka What the @ #% & * is _O_U16TEXT?
Printf (“% s”), printf (“% ls”), wprintf (“% s”), and wprintf (“% ls”)?
Russian language in source code in Dev C ++
Code Page Identifiers

And the text with "Ελληνικά" displays your version?
@VladD but what does the Greek have to do with the question?
@RussCoder: We want to make a canonically correct decision, and not the solution that breaks other features.
@VladD is very good, but then you probably should have a separate question like "How to enable Unicode support in consoles in C ++" and write the answer to it.
If you don’t like my variant so much, can I get it and do it with my own answer, is it necessary?
And try to set breakpoint in the destructor, for some reason it is caused by you at the wrong time.

Alexander Alexander 31 one · Answer 2 · 2018-05-26T19:30:06

Therefore it is worth storing the source code in Unicode (for example, UTF-8).

And you should save with a signature

The situation is partially saved by the re-storage of source codes in UTF-8 encoding with the obligatory BOM symbol; without it, Visual Studio begins to interpret the “wide” lines with Cyrillic in a very peculiar way. However, specifying the BOM (Byte Order Mark - byte order mark) of the UTF-8 encoding - a character encoded by three 0xEF, 0xBB and 0xBF bytes, we get the recognition of the UTF-8 encoding in any system

We write in Russian in code

MihailPw 6,087 2 12 28 · Answer 3 · 2017-05-01T15:14:33

For Clion:

Cygwin - when installing in the choice of packages, you need to find and mark all sorts of cmake , GDB and others recommended by someone.

Сlion - File - Settings - Editor - File Encodings: IDE Encoding, Project Encoding, main.cpp (your executable file) - UTF-8, Default encoding for properties files - IBM866
In the editor window below - UTF-8.

Include the Windows.h header file in the project

 SetConsoleCP(866); SetConsoleOutputCP(866);

Answer 4 · 2019-03-13T13:37:43

It is worth explaining something for those who are looking for the right answer about the setlocale function:

A method that is often recommended is to use the setlocale (LC_ALL, "Russian") construct; This option (at least in Visual Studio 2012) has a lot of problems. First, the problem with entering the Russian text: the entered text is transferred to the program incorrectly! Non-Russian text (for example, Greek) is not entered from the console at all. Well, common to all non-unicode solutions.

I will add more information on this method: It is not recommended at all correctly!

Let's start with the first: In the second parameter, the function accepts not the name of the country or language, although in some cases it will work, but a language identifier, according to ISO 3166-1. Therefore, correctly and correctly indicate: "ru-RU". Now the second: in the documentation for this function it is written in black and white: "If it is allowed to continue, the function sets errno to EINVAL and returns NULL." What is literally interpreted: when an error occurs, the function sets the value of the errno variable to EINVAL and returns NULL.

If an error occurs, errno will always be equal to EINVAL, which means: the argument is not valid. Therefore, it does not make sense to check it, but the function execution should be checked. Therefore, the correct call to the setlocale function is as follows:

 if (setlocale(LC_ALL, "ru-RU") == NULL) { cout << "Error set locale ru-RU." << endl; return -1; // или принудительно ставим таблицу 1251 через SetConsoleCP. // выше пример есть. И не забываем проверять результат SetConsoleCP // Если ошибка возникла, то код ошибки смотрим через GetLastError. }

And do not forget that setlocale sets a local table only for ANSI encodings, therefore Greek, Spanish, Chinese and even Japanese characters will not be displayed. For the Russian language, this will be table number 1251.

And it is important: why this function is more reliable than direct installation of the symbol table via SetConsoleCP, because it switches all internal add-ins exactly for the layout to the language. Starting from the date display standard, ending with separator characters.

And yes, you should not install a language index like "ru", because depending on the axis assembly and the existing language packs, ru-BY, ru-UA, ru-MO and other language standards significantly different from ru-RU can be installed . And categorically you can not specify "Russia", "Russian", "Russian Federation" (yes, I have already met this orgy a couple of times). Although the function checks by the name of the region, it is not always indicated in the localization table, or it can be indicated "Russia" or "Russian" already on our layout. This is the main error, because of which the setlocale function often refuses to work.

And yes, for an application running in Unix mode , use the _wsetlocale function. It is identical, and also sets the basic settings for localization. In addition, if the application project in Visual Studio is configured in Unicode mode, then only _wsetlocale will work, as the setlocale, according to the documentation, is not adapted to work with Unicode at all.

UPD.

I completely forgot to indicate that the function setlocale and _wsetlocale, in case of success, will return exactly the region identifier. That is, in our case the string "ru_RU \ 0".

the setlocale and _wsetlocale functions do exactly the same thing and are in no way connected with the Unicode mode in the project.
_wsetlocale is a wide-character version of setlocale; the locale argument and return value of _wsetlocale are wide-character strings. _wsetlocale and setlocale behave identically otherwise.
docs.microsoft.com/en-us/cpp/c-runtime-library/reference/…
As for setlocale(LC_ALL, "Russian") , this option, although not recommended, is quite acceptable and corresponds to the documentation.
The locale argument can take a locale name, a language string, a language string and country/region code, a code page, or a language string, country/region code, and code page.
The string "Russian" is listed as the name of the language in the docs.microsoft.com/en-us/openspecs/windows_protocols/ms-lcid/ ... specification . It does not depend on any layout.

Russian language in the console

4 answers 4

Correct but difficult decision

Less correct, but suitable solutions

Methods that work poorly (but can help you)

More articles: