Get the page content in PHP. I used to do this quite easily through file_get_contents ($ url). But on one site the list of goods is loaded ajax-om, more precisely an angular.
That is, if you open such a page in a browser, you will have a loader spinning and goods will appear after a while.

The file_get_contents ($ url) function gets the raw code, that is, the type

<div class="spisok" data-ng-repeat="tovar in models.tovari"> <span> {{tovari.name}} </span><br/> </div> 

How to get processed (loaded) page content?

  • See what kind of request does the browser to get this data and make a similar one from php. - Visman
  • In English, the stack has already been answered: stackoverflow.com/questions/28505501/… - Alma Z
  • php code in the studio - korytoff

5 answers 5

Try to connect via curl to the necessary page, it should work, for example:

 $curl = curl_init(); curl_setopt($curl, CURLOPT_URL, $url); curl_setopt($curl, CURLOPT_HEADER, false); curl_setopt($curl, CURLOPT_FAILONERROR, 1); curl_setopt($curl, CURLOPT_FOLLOWLOCATION, false); curl_setopt($curl, CURLOPT_POST, false); curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false); curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false); curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false); $data = curl_exec($curl) curl_close($curl); 
  • Nope You get a raw HTML code of something like {{tovari.name}} instead of the name of the goods - Alma Z
  • This code can not be directly received and opened in your browser. You create a robot client and the approach is different here. You take $ data and with the help SimpleHTML DOM further sawed. All this is possible. And if you open $ data in your browser, then this is a cross-domain query that can work crookedly. It was necessary to parse the site where ajax everything was loaded. Everything went well. - Batan112 5:06

You can help phantomjs . Through the evaluate method, you get the context of the loaded page, and then you can either download the entire page, or the entire arsenal of JS selectors in your hands. The example is taken from off. site:

 var webPage = require('webpage'); var page = webPage.create(); page.open('http://m.bing.com', function(status) { var title = page.evaluate(function() { return document.title; }); console.log(title); phantom.exit(); }); 

  • and your code is not working = ( - Excess Suslik

Using the CURL library is possible. Recently, the site parsil where everything is on ajax and with the jQuery library. So if you parse with curl and then output to the browser, then everything will be crooked, because the client is currently CURL (server), not you. The cross-domain query does not roll, so you need to take, for example, the Simple HTMLDOM library.

 $curl = curl_init(); curl_setopt($curl, CURLOPT_FAILONERROR, 1); curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1); // allow redirects curl_setopt($curl, CURLOPT_TIMEOUT, 10); // times out after 4s curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); // return into a variable curl_setopt($curl, CURLOPT_URL, "https://ya.ru/"); curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 GTB6"); curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false); curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false); $data = curl_exec($curl); 

Display $ data is not necessary. Everything, your client (server) received the page. Now download http://simplehtmldom.sourceforge.net/ And there is an instruction for example here http://zubuntu.ru/php-simple-html-dom-parser/ That is, we need to continue like this:

 $html=str_get_html($data); $result=$html->find(div.spisok span); //получаем массив 

Next we sort it out.

 foreach ($result as $one){ echo $one; //можем уже выводить то, что нашли } 

By this principle, we are looking for data, sorting through it. It is important to accurately determine the "coordinates" of the data.

    Well, immediately download the same developer tool, in any browser. In chrome and others like it (opera, Yandex, Rambler, etc.) it’s built-in firebug in the mob, see what loads you after the page loads, take this url and substitute your parameters and use the same file_get_contents to download the content example

    • do not ride. already tried. Instead of content, it simply returns false - Alma Z
    • Well, it's understandable, did you watch which method executed the ajax request? if POST then file_get_contents will not work and only CURL is there. give little information. - Redr01d
    • You open the console and alternately send requests to GET and POST that will give the result and is the desired protocol $ .post ("here is your url") or $ .get ("here is your url") - Redr01d
    • Looks like I'm just passing the wrong url. Is it possible to understand through the developer’s tools where the parameters are transferred from angular.js? - Alma Z
    • I wrote to you above, take the URL on which the request leaves and check it. angular.js has nothing to do with it; he does nothing cunning. Give the address from where you want to parse. - Redr01d

    In chrome, I use the https://chrome.google.com/webstore/detail/quick-source-viewer/cfmcghennfbpmhemnnfjhkdmnbidpanb?utm_source=chrome-app-launcher-info-dialog extension

    After loading the page and working out the scripts, you can see the result with all the changes.