Suppose there is a file in which a line like this is written:

"Hello World!\n" 

I read this line from the file as is, in the variable it turns out that:

 '"Hello World!\\n"' 

How can I most easily convert this line to the original "normal" view (open quotes, open escape sequences) without using eval ?

Opening quotes does not cause any difficulties, in principle, there is also a solution for escape sequences (roughly speaking, as long as the line contains something from '\\n' , '\\r' , '\\t' , make the appropriate substitutions), but I would like to maximize the simple / short solution without non-standard dependencies (like parse ).

Need a solution for python3.

For python2, '"Hello World!\\n"'.strip('"').decode("string-escape") , but under python3, the string does not have the decode method, and the decode method of the bytes class does not reveal the escape sequence (or am I doing something wrong).

  • and what prevents to do so: print((b'%s' % line).decode('unicode_escape')) ? Your line formatted as it should - BOPOH
  • @BOPOH, you have Python 3.5. Most likely, I have 3.4. - insolor
  • This is how it worked: line.encode (). Decode ('unicode_escape') - insolor
  • oh, I thought something I had 3 by default, i.e. I checked on 2.7. At 3 it will not work. But like this: bytes(line, 'utf-8').decode('unicode_escape') in Python 3.2.3 works fine - BOPOH
  • one
    and line.encode('cp1252', 'backslashreplace').decode('unicode-escape') appropriate? for Russian it seems to work - BOPOH

2 answers 2

It should be considered whether it is possible to correct data saving to avoid using repr() when writing text: write text directly, discarding the repr() call, or use the JSON format — both options are more efficient and more portable.

If the input format cannot be changed, ast.literal_eval() can be used:

 #!/usr/bin/env python3 import ast text = ast.literal_eval(text_repr) # where text_repr = '"Привет!\\n"' 
  • Thanks for the function, I thought that there should be something out of the box. The format of the data cannot be changed, I have my own bike for handling .pot / .po files ( gettext ). For output, I use, however, not repr() (since double quotes are always needed, and other nuances), but ast.literal_eval() , it seems, it will suit me. - insolor

At the moment I use this option:

 def unescape_string(s): return strip_once(s, '"')\ .replace(r'\\', '\\')\ .replace(r'\t', '\t')\ .replace(r'\r', '\r')\ .replace(r'\n', '\n')\ .replace(r'\"', '\"') 

You can also use something like the one suggested by BOPOH :

 line.encode(codepage, 'backslashreplace').decode(codepage, 'unicode-escape') 

where instead of the codepage in theory, you can substitute any encoding (tested on the options ascii , cp1251 , cp1252 , latin , utf-8 ), for example:

 >>> ('Привет!\\n').encode('ascii', 'backslashreplace').decode('ascii', 'unicode-escape') 'Привет!\n'