Parsing and (possibly) encoding - 'Night c' in 'Night from November 27 to November 28' -> False

Question

There is a code that parses websites:

def prepare_content(url): headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'} response = requests.get(url, headers=headers) tree = fromstring(response.text) tree.make_links_absolute(response.url) return tree

Here is an example of the code in which problems arise:

 date = day.xpath('.//div[@class="h3"]/text()')[1].strip().split(',')[0] print('date:', date) # date: Ночь c 27 ноября на 28 ноября print("'Ночь с' in date:", 'Ночь с' in date) # 'Ночь с' in date: False if 'Ночь с' in date: date = ' '.join(a.split()[-2:]) print('date:', date) # date: Ночь c 27 ноября на 28 ноября

The print("'Ночь с' in date:", 'Ночь с' in date) should produce True , not False

I understand very little in the encodings, but can the difference between the encoding of the partial information and the encoding used by IDE? If so, how to cast the parsed data to IDE encoding?

Like in the 3rd python there should not be such problems. Try unless type(date) to display. Or really with c problem. - andy.37

Alexander Fridman Alexander Fridman 31 2 · Answer 1 · 2016-01-25T15:48:54

Try adding the first line in the source file. Read more here .

 # -*- coding: utf-8 -*

Alexander Fridman

31 2

Try to publish detailed answers containing a specific example of the minimum solution, supplementing them with a link to the source. Answers –references (as well as comments) do not add knowledge to the Runet. - Nicolas Chabanovsky ♦

|

Parsing and (possibly) encoding - 'Night c' in 'Night from November 27 to November 28' -> False

1 answer 1

More articles: