u"\u043b\u043e\u043c, \u0432\u044b\u0440\u0435\u0437\u043a\u0430, to scrap, \u043e\u0442\u043a\u0430\u0437\u0430\u0442\u044c\u0441\u044f" 

How to overtake it in Python in the usual str ()?

I can call print:

 >>> print s лом, вырезка, to scrap, отказаться 

But if I just call dump s:

 >>> s u'\u043b\u043e\u043c, \u0432\u044b\u0440\u0435\u0437\u043a\u0430, to scrap, \u043e\u0442\u043a\u0430\u0437\u0430\u0442\u044c\u0441\u044f' 

And in str I can not drive:

 >>> str(s) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128) 

This is what I achieved:

 >>> for i in xrange(0,len(s)): print i, s[i], ord(s[i]) 0 л 1083 1 о 1086 2 м 1084 3 , 44 4 32 5 в 1074 6 ы 1099 7 р 1088 8 е 1077 9 з 1079 10 к 1082 11 а 1072 12 , 44 13 32 14 t 116 15 o 111 16 32 17 s 115 18 c 99 19 r 114 20 a 97 21 p 112 22 , 44 23 32 24 о 1086 25 т 1090 26 к 1082 27 а 1072 28 з 1079 29 а 1072 30 т 1090 31 ь 1100 32 с 1089 33 я 1103 

It seems that reformatting from utf-8 to ascii readable by the web is best of all:

 >>> str(s.encode('ascii', 'xmlcharrefreplace')) '&#1083;&#1086;&#1084;, &#1074;&#1099;&#1088;&#1077;&#1079;&#1082;&#1072;, to scrap, &#1086;&#1090;&#1082;&#1072;&#1079;&#1072;&#1090;&#1100;&#1089;&#1103;' 
  • Try json_decode =) - Serge Esmanovich
  • like that parsed_json = json.loads (json_string) - Serge Esmanovich
  • @SergeEsmanovich, there's something not noticeable json line. - Visman
  • @SergeEsmanovich, he does not do anything with the string, even if he draws brackets {,}, etc. He just dumps that line and that's it. There is no format. - encoder
  • =) ok then, like this docs.python.org/2/howto/unicode.html - Serge Esmanovich

1 answer 1

u'' is the textual representation of a Unicode string (immutable sequence of characters) in Python 2, where the type str used for byte strings (immutable sequence of bytes).

When working with text, you should use Unicode and therefore do not need to do anything with s .

If you need to convert the specified text into bytes, for example, to transfer as binary data over the network, you can use s.encode(character_encoding) , for example: sock.sendall(s.encode(utf-8)) .