There is a set of tasks that need to be parallelized:

The user's search query is sent to several search engines, the results are combined and returned to the user. The idea is that requests to sites are not sequential (waiting time is critical), but parallel.

Hooked the idea of setting the task to a separate script or daemon ... then to ajax check the completion of the task . A year ago, @org, answering a question about asynchronous streams, described the logic of the work, which, in my opinion, might be useful here :

  1. Script A adds the task to a specific location and gives the answer that the task is in progress.
  2. Script B (hidden from users) takes active tasks and starts performing.
  3. Script A checks the status of the task (in process / completed / not executed) and responds if successful.

Points 1 and 3 are clear. But the idea of ​​a separate script (2) or even individual scripts is not quite. In general, I will be glad to tips on possible solutions.


UPD. I pick and modify the code, which is provided on the page Development of multitasking applications for PHP v5 . I tried to add two sites. What is plugging?

<?php $timeout = 10; $result = array(); $sockets = array(); $convenient_read_block = 8192; $urls = array('ya.ru', 'google.com'); $id = 0; foreach($urls as $url) { $s = stream_socket_client($url . ":80", $errno, $errstr, $timeout, STREAM_CLIENT_ASYNC_CONNECT|STREAM_CLIENT_CONNECT); if ($s) { $sockets[$id++] = $s; fwrite($s, "GET / HTTP/1.0\r\nHost: " . $url . "\r\nAccept: */*\r\n\r\n"); } else { echo "Stream " . $id . " failed to open correctly."; } } while (count($sockets)) { $read = $sockets; stream_select($read, $w = null, $e = null, $timeout); if (count($read)) { foreach ($read as $r) { $id = array_search($r, $sockets); $data = fread($r, $convenient_read_block); if (strlen($data) == 0) { fclose($r); unset($sockets[$id]); } else { $result[$id] .= $data; } } } else { echo "Time-out!\n"; break; } } ?> 
  • one
    The vast majority of those who write in php are web developers. The vast majority of web developers know JavaScript to one degree or another. Attention, a question: why not node.js? - neoascetic
  • @neoascetic, this is my first thought and I think about it. However, I would like to know about alternatives. - Denis Khvorostin
  • @neoascetic - to know "to one degree or another", unfortunately, is not enough :) I don’t see the problem of implementing point B on php, CRON at all. - Zowie
  • one
    It is a pity that there is no "I do not like this comment" button. - Denis Khvorostin
  • @DenisKhvorostin now has such a button, a checkbox on it) - Nick Volynkin


4 answers 4

Thinking out loud: something like MapReduce. Let the JS on the user's side make an asynchronous request to different parts of the server-pollack, which, obviously, will be executed in parallel:

 /foo/ /bar/ /baz/ 

And then he (client-side) collects what he got. Thus, we assign the parallelization of the task to the web server, which will fork (or whatever it does) handlers, and client side

  • Yeah. This part of the quest is clear to me. The question is what should be on the server side in this description. - Denis Khvorostin
  • Provided that the speed of access to each individual service is satisfactory, then at least PHP. Each script will work in parallel and produce a result as soon as it is received (and processed). The problem is only if the client needs to display the result only when he receives answers from all sources (for sorting or something else). MapReduce the same! : D - neoascetic
  • Node.js is not an option: hosting restriction ("Unfortunately, this option is not available. Currently, using this script is possible only on VDS tariffs without administration") - Denis Khvorostin
  • Missed, where to send comments? Because in this answer it is supposed to use JS on the client, on the server any handlers, even though brainfuck :) The main thing is to communicate in one language (JSON, of course) - neoascetic

cURL do parallel queries. See an example and comments for curl_mutli_init

    If everything is serious, you can try to put a Job-server, for example, Gearman

      For multithreading, you can use this https://github.com/stas29a/php_threads There you have implemented all the functionality necessary for a complete solution of the multithreading problem. Well, you can finish all the nuances yourself to your requirements.