Good afternoon, dear users. My question is: there is a site http://novostroykirf.ru/1/#!devsearch;page=1 . I should merge the tablet from this first page of 20 * 4. The question is that the usual type methods look at the request and form the same via Firebug or the Selenium + PhantomJS bundle does not work. This should be done in Python, but something I sat down thorough and can not even imagine what to do next

  • Custom scripts tried? - Vladimir Gamalyan
  • Custom scripts? What is it? Tell me if you can? I don't know javascript from a word at all - Mathematician
  • The difference between Ghost and Beautiful Soup and Scrapy is that it has a whole browser under the hood, which js scripts can execute. Thus, you only need to take the data at the right moment. In theory, everything in a regular browser you see should be accessible through Ghost. - Vladimir Gamalyan
  • I can't write js. Can you do this without him? I was given a task to write on R in general, but it doesn’t really fit, because I am working with Python - Mathematician
  • The fact that Ghost has a browser with scripts is to ensure that it can also see the page as you do in a regular browser. For this knowledge js is not required. But in any case, you should at least have an idea about the DOM, from which you will pick out the data from the python. - Vladimir Gamalyan

2 answers 2

If you do not need all the data from this site, but only one page, then simply copy the element tables into a text document (developer console in Chrome) and then parse it using Python (lxml library).

enter image description here

  • Where did you find the table? I don't have it - Mathematician
  • Although I still need it on Python, and not through the console. Although initially it was generally made on R - Mathematician
  • I propose to do it in Python, only you need to copy the HTML code of this table through the console and save it to a file. Next, work with this file in Python. - Sequent
  • So I ask, where did you find this table? I don't seem to have it. Or I don’t see her point-blank because the eye has become blurred - Mathematician
  • Table data is loaded dynamically. Therefore, you need to wait for the data to be downloaded and then copy the html code through the developer console; The console is available by default in the Chrome browser. To find the desired item, right-click on the table header and select from the menu - view code. - Sequent

In general, the casket just opened. It was necessary to deliver a ten second delay so that the request worked and the dynamic contents were loaded. So the Selenium + PhantomJS bundle is quite viable.