There is a code that parses websites:
def prepare_content(url): headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'} response = requests.get(url, headers=headers) tree = fromstring(response.text) tree.make_links_absolute(response.url) return tree
Here is an example of the code in which problems arise:
date = day.xpath('.//div[@class="h3"]/text()')[1].strip().split(',')[0] print('date:', date) # date: Ночь c 27 ноября на 28 ноября print("'Ночь с' in date:", 'Ночь с' in date) # 'Ночь с' in date: False if 'Ночь с' in date: date = ' '.join(a.split()[-2:]) print('date:', date) # date: Ночь c 27 ноября на 28 ноября
The print("'Ночь с' in date:", 'Ночь с' in date)
should produce True
, not False
I understand very little in the encodings, but can the difference between the encoding of the partial information and the encoding used by IDE? If so, how to cast the parsed data to IDE encoding?
type(date)
to display. Or really withc
problem. - andy.37