Python, problems with Cyrillic in lists and tuples

Question

There is a code:

#!/usr/bin/python # -*- coding: utf-8 -*- hw = 'мир' print hw string = [] string.append(hw) print string

After start issues this:

 мир ['\xd0\xbc\xd0\xb8\xd1\x80']

With tuples the same, how to fix it?

In general, the idea is that the output of the elements of the list should work normally: print string[0] , and do not pay attention to the print output from the list, since
repr is applied to the elements of the list, which makes the Cyrillic lines unreadable.
True, under Windows and this conclusion is not readable, because
There was a comment about the fact that when it will be transferred to the function, the function will operate on the crackers - no, the "crackers" is what is displayed on the screen, because the repr spoils the output, but inside it is stored normally.

FeroxTL FeroxTL 2,723 6 17 · Accepted Answer · 2016-06-25T16:15:51

In general, on python2 - no way.

You are trying to get a string representation of the list (in your case, this is similar to calling repr). However, this causes problems so repr returns a 'str' object (actually a byte string) that contains the utf-8 characters of this list, and when you try to output it, python converts it to the default encoding, which for python 2 is ascii, respectively screened unicode is displayed.

You can try to output as

 print u'[%s]' % u','.join(unicode(x) for x in [u'привет', u'мир'])

In python 3, there is no such problem, because now everything is unicode. And the default encoding is utf-8. Everywhere. So everything works as you expect.

 $python3 Python 3.5.1+ (default, Mar 30 2016, 22:46:26) >>> print(['привет', 'мир']) ['привет', 'мир'] >>> repr(['привет', 'мир']) "['привет', 'мир']" >>> # аналогично ['привет', 'мир'].__str__() "['привет', 'мир']"

unicode(x) is either useless here (the input in the example is already the unicode type) or harmful: if you want to convert the unicode bytes, you should use the .decode() method with the encoding .decode() .

Community spirit ♦ one · Answer 2 · 2016-06-25T23:56:02

Use Unicode instead of bytes to work with text in Python. For example, add from __future__ import unicode_literals so that string constants would create unicode objects even without an explicit u'' prefix." When reading text from a file, use io.open() to get unicode. When retrieving data from the network, decode the bytes to Unicode according to the protocol, for example, if the encoding is specified in the Content-Type http header :
```
 text = data.decode(response.headers.getparam('charset')) 
```
See the answer for how to get text if data is returned by an external process .
Directly print lists / tuples only for debugging, since in this case for each element the repr() function is called: whose task is to get an unambiguous representation of the object, for example, ['\xd0\xbc\xd0\xb8\xd1\x80'] is text representation of a list containing a byte string. In Python 3, you would get [b'\xd0\xbc\xd0\xb8\xd1\x80'] (explicit b'' for a byte constant). See What makes __repr__ different from __str__ ?

Format lists / tuples / other collections explicitly:

 >>> print ', '.join([u'мир']) мир

In Python 2, repr() leaves only "printing characters" (in C locale, it is ascii-typed characters) for which isprint() returns a non-zero value (such characters are a textual representation of themselves). The remaining characters are escaped:

 >>> print([u'мир']) [u'\u043c\u0438\u0440']

In Python 3, str(some_list) also calls repr() for the elements of the some_list list, but the characters printed in the current environment can be displayed as they are ( мир ) instead of using screening ( '\u043c\u0438\u0440' ).

Python, problems with Cyrillic in lists and tuples

2 answers 2

More articles: