I am writing a parser for outputting Google search results. It seems to be easy. Similar parser Yasha and mail works. But there was a problem when downloading a captcha image. Stubbornly gives out 403 forbidden.

I make a request to Google:

$url = 'https://www.google.ru/search?complete=1&hl=ru&q='.urlencode($query)... $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_HEADER, 1); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_AUTOREFERER, true); curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path); curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0)'); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); curl_setopt($ch, CURLOPT_PROXY, $host.":".$port); curl_setopt($ch, CURLOPT_PROXYUSERPWD, $login.":".$pass); $content = curl_exec($ch); // Далее если получил редрект: if(curl_getinfo($ch, CURLINFO_HTTP_CODE) == 302) { // ................. // Выдираю картинку капчи и пытаюсь ее скачать // ................. $captcha_image = 'http://ipv4.google.com/'.$image; $fh = fopen($captcha_file, 'w'); curl_setopt($ch, CURLOPT_URL, $captcha_image); curl_setopt($ch, CURLOPT_FILE, $fh); curl_setopt($ch, CURLOPT_HEADER, 1); curl_exec ($ch); fclose($fh); // Но вместо картинки получаю 403 forbidden. } curl_close($ch); 

I tried to re-initialize the cURL session when requesting a picture. The result is the same.

Contents of $cookie_file_path :

 # Netscape HTTP Cookie File # http://curl.haxx.se/rfc/cookie_spec.html # This file was generated by libcurl! Edit at your own risk. #HttpOnly_.google.ru TRUE / FALSE 1490269969 NID 87=m8iayuoh4X_H9kTM4zNlYrVmavd0qd7X6Bj1mbyZwrn23e-BQyA-GlNYBsV9iKq5cVj1ZrB9770cWf036kdakSC3tvlDIu_KVpf8yN5ilKkUk8iHAMbi_QZqD7Inlxs3 
  • And cookies, and referral? Look at the sniffer which headers are transmitted by the browser - you need to repeat them, maybe even everything. - nick_n_a
  • Headers indicated: 'Accept: text / html, application / xhtml + xml, application / xml; q = 0.9, / ; q = 0.8'; 'Accept-Encoding: gzip, deflate'; 'Accept-Language: ru-RU, ru; q = 0.8, en-US; q = 0.5, en; q = 0.3'; 'Connection: keep-alive'; 'Host: ipv4.google.com'; 'Proxy-Authorization: Basic cnU2MDc0MDpDNEJkWk1EMTJE'; 'Upgrade-Insecure-Requests: 1'; 'User-Agent: Mozilla / 4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0)'; The result is the same - Alfa33
  • Yes, just go quickly, only services for manual recognition to help you. - Naumov

0