python lxml getting text with html attributes

Question

Searched for a specific item:

game_descriptions = html.cssselect('#game_area_description')[0]

As a result, I received such a block:

 <div id="game_area_description" class="game_area_description"> <strong>Самая популярная игра в Steam</strong> <br>Ежедневно миллионы игроков по всему миру вступают в битву от лица одного.....

How to get all the text with html attributes?

.text prints blank lines

Update:

 import lxml.html def get_html(request): return lxml.html.fromstring(request.text) html = get_html(r) game_descriptions = html.cssselect('#game_area_description')[0]

need to get rid of the div id = "game_area_description" to save the rest of the database
Interesting .. but try: ''.join([html.tostring(child) for child in game_descriptions.iterchildren()])
Add a minimal reproducible example to the question so as not to play riddles

Accepted Answer · 2018-07-26T12:16:49

Try:

 import lxml.html def to_string(node): return lxml.html.tostring(node, encoding='unicode') text = """<div id="game_area_description" class="game_area_description"> <strong>Самая популярная игра в Steam</strong> <br>Ежедневно миллионы игроков по всему миру вступают в битву от лица одного.....""" root = lxml.html.fromstring(text) game_descriptions = root.cssselect('#game_area_description')[0] print(''.join(to_string(child) for child in game_descriptions.iterchildren()))

Console:

 <strong>Самая популярная игра в Steam</strong> <br>Ежедневно миллионы игроков по всему миру вступают в битву от лица одного.....

changed the name of the variable so that there is no confusion.
I just initially wanted to use text in a mobile application, but I thought about how to keep similar markup, so I decided to save it with html tags

python lxml getting text with html attributes

1 answer 1

More articles: