I can not understand how to properly organize parsing using a proxy, so that requests with a proxy and without it go parallel and new ones are sent without waiting for a response already sent? Now I do this: I divide the array of links to individual arrays by 15 references using array_chunk and then use curl_multi_exec to send a request for them. If I divide the general array not into 15 but into 20 URL's and add a proxy to half of them, I leave the second one as it is:

 curl_setopt($ch, CURLOPT_PROXY, $proxy); 

That overall speed will decrease, as I understand it, because the responses from the proxy come slower and nothing is sent this time.

The question is how to send requests through a proxy, do not wait for answers and continue to send requests? If there is a library that implements this, then please advise. Thank!

PS - When requesting different resources of the same site, for example -

http://site.com/search.php?id=12345 ;

http://site.com/search.php?id=12346 ;

Can I use one connection? I read that opening a connection is time consuming and it’s better to use one.

  • 2
    IMHO, do not use curl_multi_exec and other multi-threading. Write a script: 1 call - 1 link. And make a queue through some rabbitmq + "master script". I would do just that. - Total Pusher
  • @TotalPusher Is it easier to work with proxies in rabbitmq? - Sergey
  • No, it is not for proxies. This is just a way to do multi-thread processing. It just runs the scripts. 1 script - 1 request. Pile in the queue links, the rabbit scatters. - Total Pusher
  • @TotalPusher may be worth a look at its tsorona, but it is hard to believe that this technology does not support work through a proxy server. - Sergey
  • @TotalPusher why complicate when you can run a PHP script with a single request from bash after finishing the command on & and do it in a bash loop. - user273805

0