In general, there is a text document, it contains the Russian text, but the output is scribbled.

It doesn't matter if the output goes to the console or somewhere else, you still get scrawls, how to make a normal text out of it?

OS - windows 10 (UPD Tried on Ubuntu, same thing)
Python - 3.6.3

Console
enter image description here

Code example:

import requests VTCr = requests.get("https://vtcpanel.com/build/FilesUpdate/updates.json") VTCdata = VTCr.json() VTClog = requests.get(VTCdata["Update"]["Files"]["Program"]["ChangelogUrl"]).text print(VTClog) 

What happens at the output:
ï »¿[add] Aperture Remedy Remedy" Reel "; [fix] amicable amicable amicular; ats; ats;

  • one
    Derived from where to? If “Not Important”, still demonstrate at least some example. minimum playable example in the studio - andreymal
  • @andreymal added - BuzzardDoc pm
  • one
    Some strange pictures are not an example. Once again: the minimum reproducible example - andreymal 5:17 pm
  • @andreymal It seems to be ready - BuzzardDoc pm
  • try running the command in Windows chcp 65001 - Pavel Gridin

3 answers 3

First of all, we look at the diagram from the article How to recognize krakozyabry? :

Diagram

We understand that the original UTF-8 encoding, mistakenly decoded as win1252.

Now, back to your code, namely, line 4. Instead of immediately trying to get the text, first we just get the result of the query.

 response = requests.get(VTCdata["Update"]["Files"]["Program"]["ChangelogUrl"]) 

We look at the encoding of the answer:

 >>> response.encoding 'ISO-8859-1' 

Which is about the same as win-1252 (there is a difference in some characters, but in this case it is insignificant).

From here an output: the server in the answer specifies the wrong coding of the text.

There are several ways to solve the problem without changing the server’s behavior.

  1. Change the encoding of the answer to the correct one, then extract the text:

     >>> response.encoding = 'utf-8' >>> response.text '\ufeff[add] Добавлен раздел "Моя компания";\n[fix] Исправлен расчет величин для ATS;' 
  2. Instead of text we take content (the data obtained are in the form of bytes) and decode it from utf-8, we get the same result:

     >>> response.content.decode('utf-8') '\ufeff[add] Добавлен раздел "Моя компания";\n[fix] Исправлен расчет величин для ATS;' 

    It is necessary to set the coding. It is advisable to use UTF-8. When outputting in UTF-8 encoding, your file should also be encoded in UTF-8. If you work with a database, it must also be in UTF-8 encoding.

      In new Pythons, the default string is set. Check the encoding of your input file. It is necessary either to replace it with a utf, or to change it already in the program code.