I try to make a parser, but do not want to work, writes

Object Moved This object may be found here.

Google, but solutions could not be found. What is the protection method that prohibits parsing data from this site?

$url = 'http://elibrary.ru/titles.asp'; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.6 (KHTML, like Gecko) Chrome/16.0.897.0 Safari/535.6'); curl_setopt($ch, CURLOPT_HEADER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0); curl_setopt($ch, CURLOPT_REFERER, $url); $content = curl_exec($ch); curl_close($ch); echo $content; 
  • one
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); not? - Dmitriy Simushev
  • var_dump ($ content); // writes false - depredator
  • and what does curl_error () say? - splash58
  • nothing, empty screen - depredator

2 answers 2

The essence of the problem is as follows: the target site sets a cookie to identify the user and gives the code 302 Moved Temporarily + the Location header. This is something like protection against unpretentious parsers that cannot work with cookies.

If we talk about your code, then the CURLOPT_FOLLOWLOCATION option set to false (or 0 ), prohibits following links from Location . Therefore, you see in the answer:

Object moved

Setting CURLOPT_FOLLOWLOCATION to true (or 1 ) does not help in your case, because curl by default still does not work with cookies. And to work, it is enough to specify:

 curl_setopt($ch, CURLOPT_COOKIEFILE, ''); 

An empty string as a value indicates to curl that cookies should be used, but without saving to the file system (see http://php.net/manual/ru/function.curl-setopt.php ).

Thus, working code may look like this:

 $url = 'http://elibrary.ru/titles.asp'; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.6 (KHTML, like Gecko) Chrome/16.0.897.0 Safari/535.6');. curl_setopt($ch, CURLOPT_HEADER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt($ch, CURLOPT_REFERER, $url); curl_setopt($ch, CURLOPT_COOKIEFILE, ''); $content = curl_exec($ch); curl_close($ch); echo($content); 
  • Thanks for the clarification, it works. - depredator

A good solution, for a long time could not solve this problem.

For some sites, not only cookies, but also the user-agent string:

curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.6 (KHTML, like Gecko) Chrome/16.0.897.0 Safari/535.6');

after which it all worked.