How to parse an HTML page using Python Grab with jQuery

Question

You need to parse one page where you need authorization and analysis of the page on which data is constantly updated using Jquery. Authorization pass. The question is how to parse / parse such a page? Parse the following page, for example, an element with id="speak"

Here is an educational example of such a page, the elements of which, after loading the page itself, change with Jquery. I use standard for grab requests:

 g = Grab(log_file='out.html') g.setup(headers={'X-Requested-With':'XMLHttpRequest'}) g.go('http://127.0.0.1:80/get_stats') #получаю необходимые данные, но до изменения с Jquery #те, которые сразу пришли с локального сервера g.doc.select('//*[@id="speak"]')[0].text()

Here is a training example page:

 <!DOCTYPE html> <html lang="en"> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <head> <meta charset="UTF-8"> <title>test</title> <script src="{% static 'js/jquery.js' %}"></script> <script type="text/javascript"> function my_foo(){ $.get('/test',{},function(data){ $("#speak").empty(); $("#speak").append(data); setTimeout(my_foo, 5000); }); } $(function(){ //jq version is 1.11 my_foo(); }) </script> </head> <body> <h2>В очереди: {{ data.0 }}</h2> <h2>Время ожидания: {{ data.1 }}</h2> <h2 id="speak">Говорят: {{ data.2 }}</h2> <h2>Вызов: {{ data.3 }}</h2> </body> </html>

m9_psy m9_psy 4,887 3 gold marks 21 silver marks 51 bronze marks · Accepted Answer · 2016-07-16T11:54:05

Using Grab will not parse dynamic pages, use selenium or Ghost . Grab can not perform this task, because to run JavaScript code you need an engine that this code will execute. Accordingly, you need to dig in the direction of headless browsers (no interface).

How to parse an HTML page using Python Grab with jQuery

1 answer 1

More articles: