There is a task of parsing data from the site Yandex-metrics.

The trouble is that almost all data is loaded using AJAX with the "show more" buttons. How can this be done? What to use. The code will be in php.

  • Do you want to parse data with every request? Can it be more reasonable to load them with a separate request that runs periodically and caches data? - cheops
  • Maybe I do not quite understand. Planned for the request to get the entire page code and then process it. Your option is not well understood, explain? - Zhenya Vedenin
  • You just may have a lot of requests and they can be repeated. Those. many identical requests may leave at the same time. You can run the script by cron, which will request data and save it somewhere in the storage, from which it will serve end users. - cheops

3 answers 3

The correct option: see which AJAX requests are executed, with which parameters and from them to get all the necessary information. It takes time to research the API to understand the format of the data and their interaction with the page.

If laziness or it takes an unreasonable amount of time for the current project on the first version, then use parsing through a real browser (for example using Selenium) or a reduced version of a browser with Javascript support, which works on the command line (for PHP there is a PHP library PhantomJS , which similar based on the original PhantomJs)

  • The customer requires it through parsing. Without API. ((( - Zhenya Vedenin

take api and parse https://tech.yandex.ru/metrika/ without problems

  • Below answered (((this is not my decision - Zhenya Vedenin
  • 2
    @ ZhenyaVedenin just tell me what api he will pay for less than the code that he throws into the furnace after the update. - Naumov

You need to use a bunch like Selenium + PhantomJS. Read the documentation, there is how to emulate button presses and everything else. Most importantly, set the delay to ten seconds so that the Ajax elements have time to load. And then you can drag the data

  • And how fast is this whole process? I assume that decent time is spent? - Zhenya Vedenin
  • Well, it depends on how much and what you want to merge. I poured data on a million homes - a day and a half worked - Mathematician