\xd1\x80\xd1\x82\xd0\xbe\xd0\xb9 system 16.04 The output of the print command gives \xd1\x80\xd1\x82\xd0\xbe\xd0\xb9 .
2 answers
To print arbitrary text, use the unicode type:
print(u'\u0439') What you see \xd0\xb9 in the Python output of the program says that you are trying to print a textual representation of a collection (such as a list, a dictionary in Python) or another composite object containing bytes , and not a unicode type :
#XXX DO NOT DO IT print [u'\u0439'.encode('utf-8')] # a bytestring in a list # -> ['\xd0\xb9'] print(list_) in Python prints the elements of the list, calling the repr() function for each individual element, analog: print("[%s]" % ", ".join(map(repr, list_))) . Such a textual representation is good for debugging: it is intended to be unambiguous (in many cases, eval(repr(obj)) == obj ). See What makes __repr__ different from __str__ ? To avoid the repr() call, format your collection manually.
To get human readable text, decode the bytes in the place where you receive them (on the border with the outside world), where you have more information about their encoding and transfer the text inside the program as a unicode type. Encode the text back into bytes using a suitable representation only when it is necessary to transfer data (write to disk, send over the network). This is the so-called Unicode sandwich concept: "decode early, encode late, use Unicode inside".
Note that although trying to print bytes directly may sometimes work, but it may break as soon as the environment changes. For example, if you type bytes that represent a text encoded in cp1251 encoding, surrounded by waiting cp866 encoding, then you can get cracks . Use Unicode to work with text in Python. Stricter attitude to mixing Unicode text and bytes is one of the key differences between Python 3 and Python 2.
There are exceptions, when your program has to work a lot with the paths on the * nix system in Python 2, then it is convenient to treat the paths as opaque cookies and receive / transmit them back to the system as is in byte. See How to work with paths with Russian characters?
When redirecting standard output to a file or pipe (pipe) or if the environment is not configured ( LC_ALL=C ), you can get a UnicodeEncodeError: 'ascii'.. error . In this case, you can set the PYTHONIOENCODING environment variable to explicitly set the I / O encoding:
$ PYTHONIOENCODING=utf-8 python -c 'print(u"\u0439")' > output.txt See more in Python 3.4 and Russian characters .
On Windows, a win-unicode-console package may be useful. See How to get Python string on Unicode to Windows console?
If you want to type Russian letters in the text of the Python program itself , then in Python 2, you must explicitly declare the encoding used at the top of the .py file; and in order not to have to use the u'' prefix to create strings from constants ( '' ) in the program text, you can enable unicode_literals :
#!/usr/bin/env python # -*- coding: utf-8 -*- from __future__ import unicode_literals print('Здравствуй, мир!') Both options can be omitted in Python 3 (everything works by default).
Note that coding: utf-8 (source code encoding) has nothing to do with I / O encoding — these are independent things.
- thanks, you helped me - pirks
s = 'фыва'.encode() print(s) print(s.decode()) out:
b'\xd1\x84\xd1\x8b\xd0\xb2\xd0\xb0' фыва - author Python 2. You have results for Python 3. - jfs