Error in regexp with Russian characters Python: SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xd1

Question

When I use Russian letters the code does not work:

average = int(re.findall(u'Среднее = (\d+)', out)[0])

Exception: SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xd1 in position 0: invalid continuation byte

Give a complete (but minimal) sample code and a full traceback.
What version of Python, the operating system, where is the output going (console, IDE)?
If you are given an exhaustive answer, mark it as correct (a daw opposite the selected answer).

jfs jfs 44.5k eight 53 199 · Accepted Answer · 2016-05-24T05:03:01

Probably SyntaxError caused by the fact that the file encoding declaration (at the top of the file itself) does not match the actual file encoding. Use an editor that saves files in utf-8.

For example, if you save the text in the utf8-charset.py file:

 #!/usr/bin/env python # -*- coding: utf-8 -*- s = u"по-русски"

and run: python utf8-charset.py , then nothing will happen (no error). But if this file is saved in a different (inconsistent declaration inside the file) encoding:

 $ iconv -t 866 utf8-charset.py > wrong-866-charset.py

then when you run the same command: python wrong-866-charset.py error is printed:

  File "wrong-866-charset.py", line 3 SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xaf in position 0: invalid start byte

Answer 2 · 2016-05-24T04:41:33

Add the definition of the source code encoding in the source file; set the encoding to UTF-8

 #!/usr/bin/env python # -*- coding: utf8 -*-

By default, the source code is interpreted in ASCII encoding . ASCII encoding does not have Russian characters.
Additionally, you can also check and, if necessary, re-save the source file in UTF-8 encoding in any text editor. The problem should disappear.

The most likely reason is that the actual file encoding is not utf-8 (which is already listed in the source code)
Then it remains to re-save the original file in utf-8 encoding.

Error in regexp with Russian characters Python: SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xd1

2 answers 2

More articles: