Broken encoding when outputting in Python 2.7

Question

There is a script when I execute it, instead of Russian characters appears:

X РџРѕР»СѓС‡РµРЅРѕ: None | РћР¶РёРґР°Р»РѕСЃСЊ: '\xd0\xad\xd1\x82\xd0\xbe\xd1\x82 \xd1\x84\xd0\xb8\xd0\xbb\xd1\x8c\xd0\xbc \xd1\x85\xd0\xbe\xd1\x80\xd0\xbe\xd1\x88'.

Google suggested solving the problem by adding lines to the beginning of the file:

  # -*- coding: utf-8 -*- from __future__ import unicode_literals

But in this case, the conclusion:

 u'\u042d\u0442\u043e\u0442 \u0444\u0438\u043b\u044c\u043c \u0445\u043e\u0440\u043e\u0448'

That is, the prefix "u" is added, but this has almost no effect on the total. I tried to change the encoding of the file. Does not help. I understand, the question is stupid and is solved, most likely, elementary.

Code:

 def main(): test(donuts(4), 'Количество : 4') test(donuts(9), 'Количество : 9')

Function code:

  def donuts(count):+ if count<10: rezult = 'Количество: {}'.format(count) else: rezult = 'Количество: много' return rezult

test function:

 def test(got, expected): if got == expected: prefix = ' OK ' else: prefix = ' X ' print('%s Получено: %s | Ожидалось: %s' % (prefix, repr(got), repr(expected)))

The problem is the same in Ubuntu and Windows. Currently trying to fight in Windows XP

If I remove repr (), then my line is displayed completely in Russian.

Community spirit ♦ one · Accepted Answer · 2016-05-07T09:48:30

You see the cracks because you use bytes, not Unicode for text, which leads to the output of the text in the wrong encoding (the text encoded in utf-8 is displayed in the Windows encoding):

 >>> u'Получено'.encode('utf-8').decode('cp1251') 'РџРѕР»СѓС‡РµРЅРѕ'

Do not use bytes, use Unicode to represent text .

from __future__ import unicode_literals forces the 'abc' string constants to create Unicode strings on Python 2 (their behavior is from Python 3). Otherwise, use the u'' prefix to set the text in the form of constants in the code.

u'\u042d\u0442\u043e' appears because you call the repr() function, which returns the textual representation of the object. The task of the repr() function is to obtain an unambiguous representation of the object, for example, for debugging, tests. Ideally, when eval(repr(obj)) == obj .

In general, to print a Unicode line in Python, remove the repr() :

 >>> print(u'\u042d\u0442\u043e') Это

If the printable characters are supported in the current environment (console encoding on Windows, the correct locale in * nix ), then this is enough. If text output is redirected in Python 2, or you want to type arbitrary characters on Windows, even those that are not supported by an OEM codepage such as cp866, then configure PYTHONIOENCODING if the output is redirected to a file, pipe, or install a win-unicode-console package so that the Windows console show any (BMP) characters (if the correct font is configured) .

while1pass while1pass 1.407 eight 34 · Answer 2 · 2016-05-07T08:30:01

Try this

 >>> print('\xd0\xad\xd1\x82\xd0\xbe\xd1\x82 \xd1\x84\xd0\xb8\xd0\xbb\xd1\x8c\xd0\xbc \xd1\x85\xd0\xbe\xd1\x80\xd0\xbe\xd1\x88'.decode('utf-8')) Этот фильм хорош

Useful information can also be found here , here , here and here.

there is no sense to set the text in unreadable byte strings, it is better to use the Unicode constant immediately: u'Этот фильм хорош' (or if OP uses from __future__ import unicode_literals , then just 'Этот фильм хорош' - .decode('utf-8') option breaks in this case).

Broken encoding when outputting in Python 2.7

2 answers 2

More articles: