Python 3 code:

import requests from bs4 import BeautifulSoup as BeautifulSoup import json site='http://www.problems.ru/view_problem_details_new.php?id=' index=88018 site=site+str(index) response = requests.get(site) html = response.text soup = BeautifulSoup(html) box= soup.find("div", {"class":"componentboxcontents"}) print (box.findAll('p')) 

In theory, he should look for all p tags in a particular div 'e, but p tags are empty. What am I doing wrong?

  • * all tags are 'p' (because of the brackets crookedly displayed) - Danila

1 answer 1

Checked the html that came, and which in the browser is slightly different - the browser contains <p> text, but the script comes with <p> , after which the text is already contained, their server can be checked for bots, and there is a small javascript that the text inserts into <p> .

Algorithm: We find <p> and after them we take the text, removing spaces, transition symbols, etc. around the text. I have the lxml module lxml because of what BeautifulSoup swears a little, so the designer accepts this parameter, delete if it interferes.

 import requests from bs4 import BeautifulSoup rs = requests.get('http://www.problems.ru/view_problem_details_new.php?id=88018') html = BeautifulSoup(rs.content.decode('KOI8-R'), 'lxml') box = html.find(attrs={"class": "componentboxcontents"}) for i, p in enumerate(box.findAll('p'), 1): print(i, p.nextSibling.strip()) print() 
  • Just the same situation in my question) - True-hacker