Hello, I need to process the html document.

The following code gives an error.

from lxml import etree import requests import lxml.html as LH from io import StringIO def get_tree(url): headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36'} result = requests.get(url, headers=headers) return LH.document_fromstring(result.content.decode()) url = 'http://www.naturalnews.com/' tree = get_tree(url) 

Mistake:

 UnicodeDecodeError: 'utf-8' codec can't decode byte 0x85 in position 41101: invalid start byte 

Help me to understand.

    1 answer 1

    Try the next option.

     from lxml import etree import requests import lxml.html as LH def get_tree(url): headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36'} result = requests.get(url, headers=headers) return LH.fromstring(result.content) url = 'http://www.naturalnews.com/' tree = get_tree(url) title = tree.cssselect("title")[0] print(title.text)