Create an instance of lxml.html.document_fromstring () or lxml.etree.XML

Question

Hello, I need to process the html document.

The following code gives an error.

from lxml import etree import requests import lxml.html as LH from io import StringIO def get_tree(url): headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36'} result = requests.get(url, headers=headers) return LH.document_fromstring(result.content.decode()) url = 'http://www.naturalnews.com/' tree = get_tree(url)

Mistake:

 UnicodeDecodeError: 'utf-8' codec can't decode byte 0x85 in position 41101: invalid start byte

Help me to understand.

Sequent sequent 439 2 silver marks 4 bronze marks · Accepted Answer · 2016-07-25T08:30:19

Try the next option.

 from lxml import etree import requests import lxml.html as LH def get_tree(url): headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36'} result = requests.get(url, headers=headers) return LH.fromstring(result.content) url = 'http://www.naturalnews.com/' tree = get_tree(url) title = tree.cssselect("title")[0] print(title.text)

Create an instance of lxml.html.document_fromstring () or lxml.etree.XML

1 answer 1

More articles: