Russian characters are displayed in the form of astrakhan fur

Question

In general, there is a text document, it contains the Russian text, but the output is scribbled.

It doesn't matter if the output goes to the console or somewhere else, you still get scrawls, how to make a normal text out of it?

OS - windows 10 (UPD Tried on Ubuntu, same thing)
Python - 3.6.3

Console

Code example:

import requests VTCr = requests.get("https://vtcpanel.com/build/FilesUpdate/updates.json") VTCdata = VTCr.json() VTClog = requests.get(VTCdata["Update"]["Files"]["Program"]["ChangelogUrl"]).text print(VTClog)

What happens at the output:
ï »¿[add] Aperture Remedy Remedy" Reel "; [fix] amicable amicable amicular; ats; ats;

If “Not Important”, still demonstrate at least some example.

Answer 1 · 2019-02-03T19:04:45

First of all, we look at the diagram from the article How to recognize krakozyabry? :

We understand that the original UTF-8 encoding, mistakenly decoded as win1252.

Now, back to your code, namely, line 4. Instead of immediately trying to get the text, first we just get the result of the query.

 response = requests.get(VTCdata["Update"]["Files"]["Program"]["ChangelogUrl"])

We look at the encoding of the answer:

 >>> response.encoding 'ISO-8859-1'

Which is about the same as win-1252 (there is a difference in some characters, but in this case it is insignificant).

From here an output: the server in the answer specifies the wrong coding of the text.

There are several ways to solve the problem without changing the server’s behavior.

Change the encoding of the answer to the correct one, then extract the text:

 >>> response.encoding = 'utf-8' >>> response.text '\ufeff[add] Добавлен раздел "Моя компания";\n[fix] Исправлен расчет величин для ATS;'

Instead of text we take content (the data obtained are in the form of bytes) and decode it from utf-8, we get the same result:

 >>> response.content.decode('utf-8') '\ufeff[add] Добавлен раздел "Моя компания";\n[fix] Исправлен расчет величин для ATS;'

alexsis20102 alexsis20102 425 2 12 34 · Answer 2 · 2019-02-03T16:29:39

It is necessary to set the coding. It is advisable to use UTF-8. When outputting in UTF-8 encoding, your file should also be encoded in UTF-8. If you work with a database, it must also be in UTF-8 encoding.

Peter Levenberg Peter Levenberg 132 7 · Answer 3 · 2019-02-03T16:32:35

In new Pythons, the default string is set. Check the encoding of your input file. It is necessary either to replace it with a utf, or to change it already in the program code.

Russian characters are displayed in the form of astrakhan fur

3 answers 3

More articles: