Hello, I'm trying to parse the site avito.ru

function getContent($url, $referer){ $cookie=''; $ch = curl_init(); curl_setopt($ch, CURLOPT_REFERER, $referer); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0); curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_HEADER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_COOKIE,$cookie); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; uk; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 Some plugins'); $data = curl_exec($ch); $header = substr($data,0, curl_getinfo($ch,CURLINFO_HEADER_SIZE)); $body = substr($data, curl_getinfo($ch,CURLINFO_HEADER_SIZE)); preg_match_all("/Set-Cookie: (.*?)=(.*?);/i", $header, $res); $cookie = ''; foreach ($res[1] as $key => $value) { $cookie = $value.'='.$res[2][$key].'; '; }; curl_close($ch); $curl = curl_init(); curl_setopt($curl, CURLOPT_REFERER, $referer); curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, 0); curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0); curl_setopt($curl, CURLOPT_URL, $url); curl_setopt($curl, CURLOPT_COOKIE, $cookie); curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; uk; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 Some plugins'); curl_setopt($curl, CURLOPT_RETURNTRANSFER,true); $data2 = curl_exec($curl); curl_close($curl); return $data2; } $html = getContent('http://www.avito.ru', 'http://google.com'); echo $html; 

For the first time, a normal parsit (gets all the contents of avito.ru), when you update it, it doesn't parse anything, just a white screen, you need to wait a while or close the browser and parse again, what could be the problem?

  • And what is the sacred meaning of a double connection with the same settings? - VenZell
  • the first time we connect and take cookies, the second time we send them - shol
  • It does not work differently, only with the same settings - shol

1 answer 1

Updating the page, in the first part you confirm the absence of cookies, and he already gave them the last time and you are banned.

You need to save yourself cookies from the first part of the request and substitute now into each request as in the second part.