Cyrillic problems in python

Question

The problem is very simple - I can’t get the contents of a page with Cyrillic characters, for example, take at least Russian Wikipedia. Using urllib did so, but constantly stumble upon Exception

from urllib.request import urlopen from urllib.parse import quote def get_content(name): print( urlopen('http://ru.wikipedia.org/wiki/' + quote(name)).readall() .decode('utf-8')) get_content('лес')

of this type:

 UnicodeEncodeError: 'charmap' codec can't encode character '\xb2' in position 14187: character maps to <undefined>

I read similar questions in other discussions, but no matter what I do with quote - the result is still the same. Maybe I'm doing something stupid, but so far just get a page from the wiki does not go

The error says that you have a problem with typing the text (in the console).
See here is a small educational program on Unicode output to the console on Python (the problem is different there, but the solution is the same)
Aside: it is not necessary to prescribe utf-8 hard — html page can use another encoding.

Stas Kazantsev Stas Kazantsev 31 four · Answer 1 · 2016-03-22T18:43:03

Just need to add

 # coding=utf-8 from urllib import urlopen, quote def get_content(name): return urlopen('http://ru.wikipedia.org/wiki/' + quote(name)).read() print get_content('лес')

No, in the comments earlier noted - the whole thing in the console output, rather than encoding.
I use PyCharm - their console (terminal) differs not only from the Windows console itself, but is also strangely arranged.

Answer 2 · 2016-06-22T00:57:29

Perhaps this will help:

 # ! /usr/bin/env python # _*_ coding: utf-8 _*_ print( urlopen(u'http://ru.wikipedia.org/wiki/' + quote(name)).readall() .decode('utf-8'))

Cyrillic problems in python

2 answers 2

More articles: