I am writing a parser for outputting Google search results. It seems to be easy. Similar parser Yasha and mail works. But there was a problem when downloading a captcha image. Stubbornly gives out 403 forbidden.
I make a request to Google:
$url = 'https://www.google.ru/search?complete=1&hl=ru&q='.urlencode($query)... $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_HEADER, 1); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_AUTOREFERER, true); curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path); curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0)'); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); curl_setopt($ch, CURLOPT_PROXY, $host.":".$port); curl_setopt($ch, CURLOPT_PROXYUSERPWD, $login.":".$pass); $content = curl_exec($ch); // Далее если получил редрект: if(curl_getinfo($ch, CURLINFO_HTTP_CODE) == 302) { // ................. // Выдираю картинку капчи и пытаюсь ее скачать // ................. $captcha_image = 'http://ipv4.google.com/'.$image; $fh = fopen($captcha_file, 'w'); curl_setopt($ch, CURLOPT_URL, $captcha_image); curl_setopt($ch, CURLOPT_FILE, $fh); curl_setopt($ch, CURLOPT_HEADER, 1); curl_exec ($ch); fclose($fh); // Но вместо картинки получаю 403 forbidden. } curl_close($ch); I tried to re-initialize the cURL session when requesting a picture. The result is the same.
Contents of $cookie_file_path :
# Netscape HTTP Cookie File # http://curl.haxx.se/rfc/cookie_spec.html # This file was generated by libcurl! Edit at your own risk. #HttpOnly_.google.ru TRUE / FALSE 1490269969 NID 87=m8iayuoh4X_H9kTM4zNlYrVmavd0qd7X6Bj1mbyZwrn23e-BQyA-GlNYBsV9iKq5cVj1ZrB9770cWf036kdakSC3tvlDIu_KVpf8yN5ilKkUk8iHAMbi_QZqD7Inlxs3