Simple print () does not work. The interpreter collapses with the complaint that it cannot convert Unicode into the encoding used in the Windows console. Maybe he can somehow help? Perhaps there is some module that can do this.
3 answers
Update: Python 3.6 uses the Unicode API for console I / O as the win_unicode_console package mentioned below (for details, see PEP 528 ). By default, arbitrary Unicode characters are supported. A simple print(unicode_string) now works without installing additional software (a console font that supports the desired characters still needs to be customized).
On the border with the Windows console, Unicode is used, inside sys.stdin , sys.stdout , sys.stderr use the utf-8 encoding. This can break the code that used the binary interface for output to the console and accordingly used the OEM codepage, for example cp866 encoding. cp866 is not compatible with utf-8, you can get krakozyabry in this case . It is necessary either to correct the code so that it %PYTHONLEGACYWINDOWSIOENCODING% text, not the bytes in the console, or set the %PYTHONLEGACYWINDOWSIOENCODING% environment variable to restore the old behavior.
The behavior for redirected input to the file, to the pipe (pipe) remains the same: locale.getpreferredencoding(False) by default (ANSI codepage, for example cp1251).
This is a long-standing problem in Python . The win_unicode_console package adds the most complete Unicode support for both output and text entry in the console — you can use this package without changing your source code using the included run module:
C:\> py -m pip install win-unicode-console C:\> py -m run path\your_script.py This allows you to print arbitrary Unicode characters, even those that are not cp866 in the current console encoding (such as cp866 ), for example, print(u'\N{snowman}') -> ☃ (all BMP Unicode characters supported by configured fonts. Even non-BMP characters can be copied to other programs from the console).
As a one-time solution (without new packages), you can set the PYTHONIOENCODING environment PYTHONIOENCODING :
C:\> set PYTHONIOENCODING=utf-8 C:\> py path\your_script.py > output_in_utf8.txt Both solutions do not add a hard-coded encoding inside the script, making them more portable.
Here is a simple example of text output to the console (Python 2.7)
# -*- coding: utf-8 -*- import sys reload(sys) sys.setdefaultencoding('cp866') # Устанавливаем кодировку вывода консоли. print(u'Произвольный текст') - 2Why
reload(sys)? - neoascetic - 3@neoascetic: changing the default encoding is not recommended (it can break libraries that do not expect it),
sys.setdefaultencoding()function is removed by default,reload(sys)makes it available again. - jfs
This option is quite possible:
code
# -*- coding: utf8 -*- def out_console(txt=''): print txt.decode('utf8') # работает print txt.encode('utf8') # вызовет ошибку a='проверка текста' out_console(a) I want to draw your attention to the fact that the " print " operator is best done via sys.stdout - the file object. This will avoid compatibility issues between python2 and python3 .
To avoid a one-time option, as in the answer above, you can do the following in the script itself:
code
# -*- codng: utf8 -*- import sys sys.putenv('PYTHONIOENCODING', 'utf8') This will be enough to set the encoding for the current process and all children. In my example, too much, because the first line in my script just sharpens the interpreter under the utf8 encoding.
Of course, the code itself must be saved in utf8 encoding. For these purposes, I recommend the editor Geany (universal solution), or Kate (mainly Linux ).
- The essence of using
PYTHONIOENCODINGis that it is an environment variable - it can be different depending on the circumstances, which allows using the same script on different systems without getting into its source code - this is not normal if you have to rewrite the program because the system (OS) language has changed. Under normal circumstances,PYTHONIOENCODINGdoes not need to be set: the encoding is taken from the locale (LANG, LC_CTYPE, LC_ALL, LANGUAGE variables on Unix), or the Windows / console codepage is used. - jfs
encode+decode+cp1251- neoascetic