The problem is very simple - I can’t get the contents of a page with Cyrillic characters, for example, take at least Russian Wikipedia. Using urllib did so, but constantly stumble upon Exception
from urllib.request import urlopen from urllib.parse import quote def get_content(name): print( urlopen('http://ru.wikipedia.org/wiki/' + quote(name)).readall() .decode('utf-8')) get_content('лес') of this type:
UnicodeEncodeError: 'charmap' codec can't encode character '\xb2' in position 14187: character maps to <undefined> I read similar questions in other discussions, but no matter what I do with quote - the result is still the same. Maybe I'm doing something stupid, but so far just get a page from the wiki does not go
sys.stdout.encoding? Just withutf-8everything should work. - approximatenumber