I want to get the results from the form on the page https://rosreestr.ru/wps/portal/p/cc_ib_portal_services/online_request/

There is a captcha. Got it. I also got the code from the url which is generated. Made a cookie record. I receive a captcha parser, I enter it - but in the end it answers that the captcha is not varn.

At the same time, if I switch the browser to the page, then I will see the same numbers on the captcha as it gives me in the parser, and when I enter it in the browser, everything is fine. Something is missing apparently from me, I can not understand what.

I provide my code that can be reproduced, only you will need to connect the HTML library DOM PARSER:

 session_start(); require_once __DIR__ . '/simple_html_dom_parser.php'; if (!$_GET) { $url = 'https://rosreestr.ru/wps/portal/p/cc_ib_portal_services/online_request/'; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); // отправляем на curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0"); curl_setopt($ch, CURLOPT_HEADER, 1); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // возвратить то что вернул сервер curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0); // следовать за редиректами curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);// таймаут4 curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);// просто отключаем проверку сертификата curl_setopt($ch, CURLOPT_COOKIEJAR, dirname(__FILE__).'/my_cookies.txt'); // сохранять куки в файл curl_setopt($ch, CURLOPT_COOKIEFILE, dirname(__FILE__).'/my_cookies.txt'); $content = curl_exec($ch); preg_match_all('/^Content-Location:(.*)$/mi', $content, $matches); if (!empty($matches[1])) { $realUrl = trim($matches[1][0]); }else{ exit('No real url'); } $findCaptcha = str_get_html($content); $img = $findCaptcha->find('tr td #captchaImage2'); $urlCaptcha = $img[0]->src; $key = str_replace(array('p0/', '=NJcaptcha=/'), '', $urlCaptcha); $goToCaptcha = 'https://rosreestr.ru'.$realUrl.$urlCaptcha; $dopUrlForm = '=MEcontroller!QCPSearchAction==/'; $formToUrl = explode('/', $realUrl); array_splice($formToUrl, 10); $goFormTo = 'https://rosreestr.ru'.implode('/', $formToUrl).'/p0/'.$key.$dopUrlForm; echo "<img src='".$goToCaptcha."'>"; echo "<form method='get'><input type='text' name='captcha'><input type='submit' value='go'></form>"; $_SESSION['referer'] = 'https://rosreestr.ru'.$realUrl; $_SESSION['form'] = $goFormTo; }else{ $post = 'search_action=true&subject=&region=&settlement=&cad_num=&start_position=59&obj_num=&old_number=&search_type=ADDRESS&subject_id=120000000000&region_id=120401000000&street_type=str0&street=&house=54%2F5&building=&structure=&apartment=&right_reg=&encumbrance_reg=&captchaText='.$_GET['captcha']; $ch1 = curl_init(); curl_setopt($ch1, CURLOPT_URL, $_SESSION['form']); curl_setopt($ch, CURLOPT_COOKIEJAR, dirname(__FILE__).'/my_cookies.txt'); // сохранять куки в файл curl_setopt($ch1, CURLOPT_COOKIEFILE, dirname(__FILE__).'/my_cookies.txt'); curl_setopt($ch1, CURLOPT_POST, 1); curl_setopt($ch1, CURLOPT_POSTFIELDS, $post); curl_setopt($ch1, CURLOPT_REFERER, $_SESSION['referer']); curl_setopt($ch1, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0"); curl_setopt($ch1, CURLOPT_HEADER, array('Host: rosreestr.ru', 'Origin: https://rosreestr.ru', 'Upgrade-Insecure-Requests: 1')); curl_setopt($ch1, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch1, CURLOPT_FOLLOWLOCATION, 0); curl_setopt($ch1, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($ch1, CURLOPT_SSL_VERIFYPEER, false); $data = curl_exec($ch1); echo $data; } 

I would be grateful for the help!

    1 answer 1

    As I understand it, your code will not be able to work as it is:

     $goFormTo = 'https://rosreestr.ru'.implode('/', $formToUrl).'/p0/'.$key.$dopUrlForm; 

    should be replaced by

     $goFormTo = 'https://rosreestr.ru/'.implode('/', $formToUrl).'/p0/'.$key.$dopUrlForm; // ^ 


    And the rest, everything works perfectly from the command line, including with a few simplified parameters:

    • we get the initial form
      curl -b cookies.txt -c cookies.txt -i -o index.html https://rosreestr.ru/wps/portal/p/cc_ib_portal_services/online_request/
    • look for the Content-Location header in index.html
    • look for the index captcha link in captchaImage2 in index.html
    • doing string manipulations as described in the PHP question code
    • get the picture captcha
      curl -b cookies.txt -c cookies.txt -o captcha.png https://rosreestr.ru/wps/portal/p/cc_ib_portal_services/online_request/!ut/p/z1/04_Sj9CPykssy0xPLMnMz0vMAfIjo8zi3QNNXA2dTQy93QMNzQ0cPR29DY0N3Q0MQkz1w_Eq8DfUj6JEP1ABSL8BDuBoANQfhdcKZyMCCkBOJGRJQW5ohEGmpyIAKLXudw!!/dz/d5/L2dBISEvZ0FBIS9nQSEh/p0/IZ7_01HA1A42KODT90AR30VLN22001=CZ6_GQ4E1C41KGQ170AIAK131G00T5=NJcaptcha=
    • we recognize captcha in the captcha.png file
    • form a request for data (not forgetting to register the correct captcha)
      curl -b cookies.txt -c cookies.txt -o reply.html -X POST -H "Content-Type: application/x-www-form-urlencoded" -d "search_action=true&subject=®ion=&settlement=&cad_num=&start_position=59&obj_num=&old_number=&search_type=ADDRESS&subject_id=145000000000®ion_id=145263000000&street_type=str0&street=&house=&building=&structure=&apartment=&right_reg=&encumbrance_reg=&captchaText=53649" https://rosreestr.ru/wps/portal/p/cc_ib_portal_services/online_request/!ut/p/z1/04_Sj9CPykssy0xPLMnMz0vMAfIjo8zi3QNNXA2dTQy93QMNzQ0cPR29DY0N3Q0MQkz1w_Eq8DfUj6JEP1ABSL8BDuBoANQfhdcKZyMCCkBOJGRJQW5ohEGmpyIAKLXudw!!/p0/IZ7_01HA1A42KODT90AR30VLN22001=CZ6_GQ4E1C41KGQ170AIAK131G00T5=MEcontroller!QCPSearchAction==/
    • get the result in reply.html
    • Something did not notice the differences in the above lines of code ... - iKey
    • one
      / at the end of the line rosreestr.ru - Sergey Nudnov
    • Exactly) in 20 minutes I will check your version. The logic of the work on the code given by me was described as accurate :) - iKey
    • if you do as you suggested $goFormTo = 'https://rosreestr.ru/ , that is, with a slash after .ru then it turns out that a link of this type: https://rosreestr.ru//wps/p and already Not Found is given by the respondent site ... - iKey
    • I probably understand what the problem is ... if this is the case, I’ll give you an answer a little later, when I’m convinced of it - iKey