Problem with encoding url on html page in Python 3

Question

And if you get page after urllib, the link looks like this: http://get.sweetbook.net/b/36078/dcMxTA1ODDAjyUYXxstuglvFQ__MpjpMJy7-5N6ODWQ,/ \ xd0 \ x9c \ xd0 \ XB0 \ xd0 \ xbb \ xd0 \ XB5 \ xd0 \ xbd \ xd1 \ x8c \ xd0 \ xba \ xd0 \ xb8 \ xd0 \ xb9 \ xd0 \ xbf \ xd1 \ x80 \ xd0 \ xb8 \ xd0 \ xbd \ xd1 \ x86.mp3

What to do and how to get a link, as through a browser?

Leon Nash Leon Nash 98 one one eight · Accepted Answer · 2016-05-14T16:07:44

Thank you all for the answers. But I found the answer by typing. In all documentation and tutorials, getting a web page looks like this:

req = urllib.request.Request(url, headers=headers) with urllib.request.urlopen(req) as response: page = response.read()

But so spaces and indents are displayed as heaps \ n and \ t, as well as Cyrillic is incorrectly displayed. The problem is solved like this:

  req = urllib.request.Request(url, headers=headers) with urllib.request.urlopen(req) as response: page = response.read().decode()

Simply .decode() is incorrect if the page does not use utf-8 encoding.
If you do not know how to find page_encoding encoding of a web page, then ask a separate question.

1 answer 1