For this problem there are many solutions. If you need a quick and not necessarily a universal solution, so as not to understand much, scroll to the “ Less correct, but suitable solutions ” section.
Correct but difficult decision
To begin with, the problem with the Windows console is that its fonts, which are “default”, do not show all the characters. You should change the console font to unicode, it will even work on English Windows. If you want to change the font only for your program, in its console, click on the icon in the upper left corner → Properties → Font. If you want to change for all future programs, the same thing, just go to the Default, not Properties.
Lucida Console and Consolas cope with everything except hieroglyphs. If your console fonts allow, you can output and 猫 , if not, only those characters that are supported.
Further consideration concerns only Microsoft Visual Studio. If you have a different compiler, use those suggested at your own risk, there is no guarantee.
Now, the encoding of the compiler input files. The Microsoft Visual Studio compiler (at least versions 2012 and 2013) compiles the sources in single - byte encodings as if they were actually in ANSI-encoding, that is, for the case of the Russian system, CP1251. This means that the source encoding in CP866 is incorrect. (This is important if you use the L"..." strings.) On the other hand, if you store the sources in CP1251, these same sources will not normally compile on non-Russian Windows. Therefore it is worth storing the source code in Unicode (for example, UTF-8).
After setting up the environment, let's move on to solving the actual problem.
The correct solution is to get away from single-byte encodings, and use Unicode in the program. At the same time, you will get the correct output not only of the Cyrillic alphabet, but also the support of all languages (there will be no image of characters missing in the fonts, but you will be able to work with them). For Windows, this means moving from narrow strings ( char* , std::string ) to wide ( wchar_t* , std::wstring ), and using UTF-16 encoding for strings.
(Another problem solved by the use of wide strings: narrow strings are encoded into single-byte encoding using the current system code page, that is, ANSI encoding. If you compile your program on English Windows, this will lead to obvious problems.)
You need _setmode(_fileno(...), _O_U16TEXT); to switch console mode:
#include <iostream> #include <io.h> #include <fcntl.h> int wmain(int argc, wchar_t* argv[]) { _setmode(_fileno(stdout), _O_U16TEXT); _setmode(_fileno(stdin), _O_U16TEXT); _setmode(_fileno(stderr), _O_U16TEXT); std::wcout << L"Unicode -- English -- Русский -- Ελληνικά -- Español." << std::endl; // или wprintf(L"%s", L"Unicode -- English -- Русский -- Ελληνικά -- Español.\n"); return 0; }
This method should work correctly with input and output, with file names and stream redirection.
Important note: I / O streams are either in the “wide” or “narrow” state — that is, either only char* or wchar_t* is displayed in them. After the first output, switching is not always possible. Therefore, this code:
cout << 5; // или printf("%d", 5); wcout << L"привет"; // или wprintf(L"%s", L"привет");
may well not work. Use only wprintf / wcout .
If you really do not want to go to Unicode, and use a single-byte encoding, problems will arise. To begin with, characters that are not included in the selected encoding (for example, for the case of CP1251 - basic English and Cyrillic) will not work; gibberish will be entered and displayed instead. In addition, narrow string constants are ANSI-encoded, which means that Cyrillic string literals on a non-Russian system will not work (they will have abracadabra dependent on the system locale). Keeping in mind these problems, we proceed to the presentation of the next series of solutions.
Less correct, but suitable solutions
In any case, put the unicode font in the console. (This is the first paragraph of the "complex" solution.)
Make sure that your sources are encoded in CP 1251 (this does not go without saying, especially if you are not in the Russian Windows locale). If when adding Russian letters and saving, Visual Studio swears that it cannot save characters in the correct encoding, select CP 1251.
(1) If your computer is yours, you can change the code page of console programs on your system. To do this, do this:
- Run Regedit.
- For every fireman, export the registry somewhere (for some reason everyone skips this step, so that when everything breaks down, we warned you).
- In the
HKEY_CURRENT_USER\Console section, find the CodePage key (if not, create a key with the same name and DWORD type). - Set the value by key (left key / change / number system = decimal) to 1251.
- Do not forget to reboot after changes in the registry.
Advantages of the method: examples from the books will start working out of the box. Disadvantages: changing the registry can cause problems, the console encoding is changing globally and permanently - it can affect other programs to break. Plus, the effect will be only on your computer (and on others who have the same encoding of the console). Plus common problems of non-unicode methods.
Note. Installing the global console code page through the registry HKEY_CURRENT_USER\Console\CodePage does not work in Windows 10, the OEM code page will be used instead - presumably a bug in conhost . At the same time, installation of the console code page at the application-specific level ( HKEY_CURRENT_USER\Console\(путь к приложению)\CodePage ) works.
(2) You can change the encoding of your program only. To do this, you need to change the console encoding programmatically. Out of politeness to other programs, do not forget to return the encoding to the place!
This is done either by calling functions.
SetConsoleCP(1251); SetConsoleOutputCP(1251);
at the beginning of the program, or by calling an external utility
system("chcp 1251");
(I mean, you should have something like
#include <cstdlib> int main(int argc, char* argv[]) { std::system("chcp 1251"); ...
or
#include <Windows.h> int main(int argc, char* argv[]) { SetConsoleCP(1251); SetConsoleOutputCP(1251); ...
and further ordinary program code.)
You can wrap these calls in a class to take advantage of the automatic control of the lifetime of C ++ objects.
Example:
#include <iostream> #include <string> int chcp(unsigned codepage) { // составить команду из кусочков std::string command("chcp "); command += codepage; // выполняем команду и возвращаем результат return !std::system(command.c_str()); } // этот код будет запущен перед main static int codepage_is_set = chcp(1251);
(if you are performing a task from Stroustrup, you can insert it at the end of the std_lib_facilities.h header file)
Or so:
#include <windows.h> class ConsoleCP { int oldin; int oldout; public: ConsoleCP(int cp) { oldin = GetConsoleCP(); oldout = GetConsoleOutputCP(); SetConsoleCP(cp); SetConsoleOutputCP(cp); } // поскольку мы изменили свойства внешнего объекта — консоли, нам нужно // вернуть всё как было (если программа вылетит, пользователю не повезло) ~ConsoleCP() { SetConsoleCP(oldin); SetConsoleOutputCP(oldout); } }; // и в программе: int main(int argc, char* argv[]) { ConsoleCP cp(1251); std::cout << "русский текст" << std::endl; return 0; }
If you need not some Russian, but some other language, just replace 1251 with the identifier of the desired encoding (the list is listed below in the file), but, of course, operation is not guaranteed.
There are methods that are also often found, we give them for completeness.
Methods that work poorly (but can help you)
A method that is often recommended is to use the setlocale(LC_ALL, "Russian"); construct setlocale(LC_ALL, "Russian"); This option (at least in Visual Studio 2012) has a lot of problems. First, the problem with entering the Russian text: the entered text is transferred to the program incorrectly! Non-Russian text (for example, Greek) is not entered from the console at all. Well, common to all non-unicode solutions.
Another method that does not use Unicode is the use of the CharToOem and OemToChar . This method requires recoding each of the lines in the output, and (it seems) weakly amenable to automation. He also suffers from common non-Unicode solutions. In addition, this method will not work (not only with constants, but also with runtime strings!) On non-Russian Windows, since there the OEM encoding will not be the same as CP866. In addition, you can also say that these functions are not supplied with all versions of Visual Studio - for example, in some versions of VS Express, they simply do not exist.
Sources:
- How to display and enter data of type wchar_t []?
- unfortunately, the author of that issue used the MinGW compiler under Cygwin and WinXP, which makes most modern solutions inapplicable.
- Output unicode strings in Windows console app
- Conventional wisdom is retarded, aka What the @ #% & * is _O_U16TEXT?
- Printf (“% s”), printf (“% ls”), wprintf (“% s”), and wprintf (“% ls”)?
- Russian language in source code in Dev C ++
- Code Page Identifiers