On the page there is a link with Cyrillic. If you look at the page code through the browser (Chrome), then the link looks like this: http://get.sweetbook.net/b/36078/I4SXM5F6MJVEmNYbUb7EBLwPKpI8eZ4t8sVT24HuWUY,/%D0% 9C%D0B0%D0BBB% D0% BD% D1% 8C% D0% BA% D0% B8% D0% B9% 20% D0% BF% D1% 80% D0% B8% D0% BD% D1% 86.mp3

And if you get page after urllib, the link looks like this: http://get.sweetbook.net/b/36078/dcMxTA1ODDAjyUYXxstuglvFQ__MpjpMJy7-5N6ODWQ,/ \ xd0 \ x9c \ xd0 \ XB0 \ xd0 \ xbb \ xd0 \ XB5 \ xd0 \ xbd \ xd1 \ x8c \ xd0 \ xba \ xd0 \ xb8 \ xd0 \ xb9 \ xd0 \ xbf \ xd1 \ x80 \ xd0 \ xb8 \ xd0 \ xbd \ xd1 \ x86.mp3

What to do and how to get a link, as through a browser?

    1 answer 1

    Thank you all for the answers. But I found the answer by typing. In all documentation and tutorials, getting a web page looks like this:

    req = urllib.request.Request(url, headers=headers) with urllib.request.urlopen(req) as response: page = response.read() 

    But so spaces and indents are displayed as heaps \ n and \ t, as well as Cyrillic is incorrectly displayed. The problem is solved like this:

      req = urllib.request.Request(url, headers=headers) with urllib.request.urlopen(req) as response: page = response.read().decode() 
    • Simply .decode() is incorrect if the page does not use utf-8 encoding. Instead, use: .decode(page_encoding) . If you do not know how to find page_encoding encoding of a web page, then ask a separate question. HTTP response in Python . - jfs