Task:
There are a number of points with distance and arrival time. There is no direct connection between all points. Knowing the starting and ending point, you need to find the shortest path (or time).

The table, without any frills, looks like this:

id city_1 city_2 distance_km time_mins 1 AB 80 40 2 AC 110 55 3 AE 330 240 4 BF 340 230 5 CD 60 20 6 CE 205 80 7 DF 192 110 8 EF 80 40 

Realization conceived this:

  • Functions at the entrance is the starting and ending point.
  • At the starting point, a request is made to the database, where all unique values ​​for the fields city_1 and city_2 for the input point are selected (sorted by increment).
  • The resulting array is started through foreach , and for each point in the sample, we recursively start the same function, while simultaneously storing these points in the return array.

At the moment of writing this epic, the under-code looks like this (as long as the query is not sorted):

 try { $dbh = new PDO("mysql:host=localhost;dbname=voyage", "root", ""); $dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION); }catch (PDOException $e){ echo $e->getMessage(); } function getTree($start, $end){ global $dbh; $query = "(SELECT city_1 AS $start FROM distances WHERE city_2='{$start}') UNION (SELECT city_2 FROM distances WHERE city_1='{$start}')"; $st = $dbh->query($query); $res = $st->fetchAll(PDO::FETCH_ASSOC); $tree = array(); foreach ($res as $k => $v) { if($k == $end) $tree[] = $v; elseif(is_array($res[$k])){ $ret=getTree($v, $end); if(count($ret)) $tree[] = $ret; } } return $tree; } 

Running this code now turns out this:

 Array ( [0] => Array ( [A] => B ) ) 

Filling the array prompted from here .
As planned, I have to get an array from the starting point to all possible ends ( $end ). Further, one of the branches of the array of the form $tree['A']['C']['D']['F'] , the keys of which I would disassemble in the future and along which I derived route options. And now about the difficulties:

  1. This method of filling the return array was not able to adapt to their needs.
  2. If, for example, in the next iteration I give the key 'С' , I need to filter out the parent key, otherwise there will be infinity. I understand that at the right moment you need to add a condition to skip the iteration with continue . But how to access the parent key?
  3. How to iterate through each branch of the returned array, namely the keys (example branch below).

Ie, running the code getTree('A', 'F'); I expect to get such results (without sorting):

 Array ( [B] => F [C] => Array ( [D] => F [E] => F ) [E] => F ) 
  • First worth reading ru.wikipedia.org/wiki/… . I have a feeling that you can lead to an array with the same structure as a table. No nesting Here you can mark in which nodes you have already visited - Mike
  • I am familiar with the Deistra algorithm. And at the expense of the entire table in the array. And if the size of the table is huge? It will not affect performance? - tekken
  • Well, looking at how huge. And re-sampling from the database of records that were on the other steps of the algorithm - this will definitely affect the performance. And problem number one, yes, you are right, do not get hung up, but we don’t get stuck, we can only remember what pairs of points we have already passed - Mike
  • I figured it out ... you will always have to go through the entire route network for any search. So the table will love to load all in memory. Apparently, it is necessary to introduce the concept of the maximum possible distances, so as to chop off completely hopeless routes in time. If the network is really big and the task is not just of academic interest, then maybe you should think about other DBMSs - Mike
  • Well, while this is purely academic interest. In that case, which way to go? I will try to implement the code according to your recommendations - tekken

2 answers 2

You intrigued me too much with the task, I had to solve it :) However, if I had information in the database, I was lazy to do it in php and I decided to do it entirely in MySQL. At the same time I learned how to create procedures. I did it quickly, for real work, you need to slightly revise the policy of creating new records in the query table and deleting them when changes are made in the distance table. But first things first ...

Base structure:

 create table distances( /* Ваша таблица дистанций */ city_1 varchar(4) not null, city_2 varchar(4) not null, dist_km int not null, time_mins int not null, primary key(city_1,city_2) ); insert into distances(city_1,city_2,dist_km,time_mins) values ('A','B',80,40),('A','C',110,55),('A','E',330,240),('B','F',340,230), ('C','D',60,20),('C','E',205,80),('D','F',192,110),('E','F',80,40),('X','A',400,320); create table dquery( /* Таблица запросов. При каждом поиске маршрута сюда добавляется запись */ /* При реальной эксплуатации стоит добавлять запись, только если еще не делали поиск для данной точки отправления и метрики*/ /* И выбирать существующую - если делали */ /* при изменениях в таблице дистанций - удалять все поиски */ qid int not null auto_increment, city varchar(4) not null, /* Маршруты из этого города */ mtype char(1) default 'D' not null, /* Метрика: D - Дистанции, T - время в пути*/ primary key(qid) ); create table vertex( /* Все узлы, до которых можно добраться из пункта запроса*/ qid int not null, /* ID запроса */ vcity varchar(4) not null, /* Город назначения */ metric int, /* Лучшая метрика */ closed char(1) default 'N' not null, /* Узел обработан */ route varchar(1000), /* Лучший маршрут */ primary key (qid,vcity) ); 

The procedure for finding routes on the Dijkstra algorithm :

 drop procedure if exists deykstra; delimiter $$ create procedure deykstra(p_city varchar(4),p_type char(1)) /* Параметры: город для которого ищем маршруты, тип метрики (расстояние/время) */ begin declare m_qid int; /* Текущий запрос */ declare m_city varchar(4); /* Город проверяемого узла */ declare m_metric int; /* Текущая метрика от пункта A до текущего узла */ declare m_route varchar(1000); /* Маршрут от пункта А до текущего узла */ declare numrows int; /* кол-во записей (флаг завершения работы */ insert into dquery(city,mtype) values(p_city,p_type); /* Вставляем запись запроса */ SELECT LAST_INSERT_ID() into m_qid; /* Сохраняем ее ID */ -- Вставляем запись для начального узла insert into vertex values(m_qid,p_city,0,'N',p_city); main: LOOP /* Главный и единственный цикл */ select vcity, metric, route, count(1) /* Получаем очередной рабочий узел */ into m_city,m_metric,m_route,numrows from vertex where qid=m_qid and closed='N' limit 1; IF numrows=0 THEN /* Если узлов больше нет - завершаем работу */ LEAVE main; END IF; /* Вставляем вновь обнаруженные узлы и меняем метрику и маршрут у существующих узлов */ /* с большей метрикой, чем найденная */ insert into vertex(qid,vcity,metric,route) select m_qid,A.city,A.metric,concat(m_route,',',A.city) /* К маршруту до текущего узла, добавляем следующий город */ from (select if(city_1=m_city,city_2,city_1) city, if(p_type='D',dist_km,time_mins)+m_metric metric /* Метрика до текущего узла + метрика из него до следующего */ from distances where (city_1=m_city or city_2=m_city) ) A where not exists( select 1 from vertex V /* только записи для которых нет уже известной, лучшей метрики */ where V.qid=m_qid and V.vcity=A.city and V.metric<=A.metric ) on duplicate key update metric=A.metric,route=concat(m_route,',',A.city); /* Помечаем текущий узел пройденным */ update vertex set closed='Y' where qid=m_qid and vcity=m_city; END LOOP main; select * from vertex where qid=m_qid; /* Возвращаем набор всех найденных узлов */ end$$ 

Since to search for the optimal route, you need to go through all the nodes of the network and save metrics for all nodes passed, there is no point in looking for a route from point A to point B. It is much more convenient to save all the optimal routes from point A to all reachable points. And then from this plate to choose routes as needed.

The algorithm turned out even easier than I expected initially. No recursions or complex structures. One cycle that selects a node that has not yet been processed from the node table and adds to it all the newly found ones. Passed the node, put a mark that would not return more to him. If you implement in php, then instead of my vertex table you will need an array of records with a similar structure, except that qid can be removed.

Results of work:

 Из пункта A с метрикой "дистанция": mysql> call deykstra('A','D'); +-----+-------+--------+--------+---------+ | qid | vcity | metric | closed | route | +-----+-------+--------+--------+---------+ | 8 | A | 0 | Y | A | | 8 | B | 80 | Y | A,B | | 8 | C | 110 | Y | A,C | | 8 | D | 170 | Y | A,C,D | | 8 | E | 315 | Y | A,C,E | | 8 | F | 362 | Y | A,C,D,F | | 8 | X | 400 | Y | A,X | +-----+-------+--------+--------+---------+ Из пункта A с метрикой "время": mysql> call deykstra('A','T'); +-----+-------+--------+--------+---------+ | qid | vcity | metric | closed | route | +-----+-------+--------+--------+---------+ | 10 | A | 0 | Y | A | | 10 | B | 40 | Y | A,B | | 10 | C | 55 | Y | A,C | | 10 | D | 75 | Y | A,C,D | | 10 | E | 135 | Y | A,C,E | | 10 | F | 175 | Y | A,C,E,F | | 10 | X | 320 | Y | A,X | +-----+-------+--------+--------+---------+ 

And to solve the problem of finding all paths in php, I think it’s worthwhile to do a not heavily nested array, for example, an array with keys in the form of paths, as I did in the tables, separated by a comma or another separator. Then, going recursively, you will always see the route passed and not return to the points where you already were, simply by checking the presence of a point in the current path.

PHP code:

 <?php $input=array(array("city_1"=>'A',"city_2"=>'B',"distance_km"=>80,"time_mins"=>40),array("city_1"=>'A',"city_2"=>'C',"distance_km"=>110,"time_mins"=>55),array("city_1"=>'A',"city_2"=>'E',"distance_km"=>330,"time_mins"=>240),array("city_1"=>'B',"city_2"=>'F',"distance_km"=>340,"time_mins"=>230),array("city_1"=>'C',"city_2"=>'D',"distance_km"=>60,"time_mins"=>20),array("city_1"=>'C',"city_2"=>'E',"distance_km"=>205,"time_mins"=>80), array("city_1"=>'D',"city_2"=>'F',"distance_km"=>192,"time_mins"=>110), array("city_1"=>'E',"city_2"=>'F',"distance_km"=>80,"time_mins"=>40)); $dist=array(); foreach($input as $node) { // Строим массив дистанций для быстрого поиска [A][B]=80; [B][A]=80; $dist[$node['city_1']][$node['city_2']]=$node['distance_km']; $dist[$node['city_2']][$node['city_1']]=$node['distance_km']; } $result=Deykstra($dist,'A'); print "Route from A to F ".$result['F']['route']." distance ".$result['F']['metric']."\n"; function Deykstra($dist,$from) { $M=array(array("vert"=>$from, "metric"=>0, "route"=>$from)); // Массив вершин с стартовой вершиной $S=array($from=>0); // Массив номеров элементов в массиве вершин по кодам городов for($i=0;$i<count($M);$i++) // Перебираем все вершины { // Данные проверяемой вершины Код, Маршрут, Метрика $v1=$M[$i]['vert']; $route=$M[$i]['route']; $metric=$M[$i]['metric']; foreach($dist[$M[$i]['vert']] as $v2=>$m2) { // Перебираем все вершины до которых можно добраться напрямую if(!array_key_exists($v2,$S)) // Вершина назначения еще не встречалась { $S[$v2]=count($M); // Добавляем индекс по коду $M[]=array("vert"=>$v2, "metric"=>($metric+$m2), "route"=>"$route/$v2"); // И саму вершину } else { // Вершина уже встречалась, пересчитываем метрику, если необходимо $ind=$S[$v2]; if($M[$ind]['metric']>($metric+$m2)) // Метрика вершины больше текущей { // обновляем метрику и маршрут до вершины $M[$ind]['metric']=$metric+$m2; $M[$ind]['route']="$route/$v2"; } } } } // Из массива по номерам, делаем по именам городов foreach($S as $key=>$ind) $S[$key]=$M[$ind]; return $S; } 
  • By the way, I did not understand by the example of points A-> F. According to Dijkstra's algorithm, if we take point A as the beginning, then we are looking for a vertex with the shortest distance from it. Such is the peak B (80 km). From it I take the subsequent countdown, and this is the only point F (340 km), which in total gives 420 km. According to your own algorithm, if you go through the points A-> C-> D-> F, then the total distance will be equal to 362 km. What did I miss? - tekken
  • @tekken You did not understand Dijkstra correctly; he does not go along the smallest edge. it takes turns from this vertex along all edges, without recursion, without going deep, and writes the cost of the edge to the cost of the top, if it is less than what is currently recorded at the top. Thus, after processing the vertex A at the vertices B, C, E, the values ​​80,110,330 are simply written. Then we simply take the next vertex known there, for example, B and go from there to everything and write 80 + edge weight to them. In F 340 + 80 = 420. although in this way we reach the other edges again to F, but with a lower current value. - Mike
  • @tekken In MySQL, build my model and paste into the procedure select * from vertex where qid=m_qid; inside the loop, after update. In theory, you will see the state of the vertex table after each iteration of the algorithm and it will become clearer - Mike
  • @tekken Here, look, I made for vertex A: pastebin.com/B3D6dwL6 look at the closed column, the algorithm processed the vertex that had 'N' and became 'Y'. And you can see how the weights and routes of the other peaks change. Actually the weight of the vertex in the table is the sum of the weights along the route indicated by it - Mike
  • @tekken In the console - is it in the mysql utility? In Google, indicate this error number, there are all kinds of solutions, and they do not specifically deal with procedures. In general, while I was doing this, I simply drove these requests without any procedure just in turn, as it were, with my hands, the cycle performed. And instead of the variables that are declared in the procedure, I used the usual ones with @ - Mike

I got to the point of looking at the code, everything should work, but in fact it’s like this: I give 2 points to the input of the function, and I start to check both of them. If at the first call one of the points was found ( $point1/$point2 ), then it is considered the beginning (we write the result to the array), and the other ( $point2/$point1 ), respectively, the end (written down as $nextPoint and $finishPoint ). After the first write to the array, I add the record ID from the table (which I will check in the subsequent recursion, so as not to return to this record, well, I will use it to display the result), well, after it, respectively, the $nextPoint point (which in the subsequent recursion will be $point1 ). Before calling recursion, I check if the next point $nextPoint is equal to $finishPoint . If so - I return the resulting array. If not, I run recursion. I understand that the code can and should be optimized, but as long as there is no result, there is no point in optimizing.

The code looks like this for now:

 try { $dbh = new PDO("mysql:host=localhost;dbname=voyage", "root", ""); $dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION); }catch (PDOException $e){ echo $e->getMessage(); } $query = "SELECT * FROM distances ORDER BY distance_km"; $st = $dbh->query($query); $res = $st->fetchAll(PDO::FETCH_ASSOC); function getRoute($point1 = false, $point2 = false, $arr) { if ((!$point1 || !$point2) || ($point1 == $point2) || !$arr) return false; $tableArray = $arr; static $result = array(); static $finishPoint = ""; $nextPoint = ""; foreach ($arr as $key => $value) { if (array_search($value['id'], $result)) //Если в массиве уже есть ИД записи, тогда крутим дальше. Так исключаем все предыдущие точки, с которых пришли continue; if (!$result) { //Точку входа №2 $point2 проверяю только на первой итерации, потому что во всех следующих ставлю туда имя следующей точки для поиска if ($point2 == $value['city_1']) { $finishPoint = $point1; // Если точка $point1 начало, значит $point2 - конец $result[] = $nextPoint; $result[] = $value['city_1']; $result['id'.$value['id']] = $value['id']; $result[] = $nextPoint; } if ($point2 == $value['city_2']) { $finishPoint = $point1; $nextPoint = $value['city_1']; $result[] = $value['city_2']; $result['id'.$value['id']] = $value['id']; $result[] = $nextPoint; } } if ($point1 == $value['city_1']) { $nextPoint = $value['city_2']; if($result){ $result['id'.$value['id']] = $value['id']; $result[] = $value['city_1']; }else{ $result[] = $value['city_1']; $result['id'.$value['id']] = $value['id']; $result[] = $nextPoint; } } if ($point1 == $value['city_2']) { $nextPoint = $value['city_1']; if($result){ $result['id'.$value['id']] = $value['id']; $result[] = $nextPoint; }else{ $result[] = $value['city_2']; $result['id'.$value['id']] = $value['id']; $result[] = $nextPoint; } } if(!$finishPoint) $finishPoint = $point2; if ($nextPoint == $finishPoint) return $result; getRoute($nextPoint, $finishPoint, $tableArray); } return $result; } echo "<pre>"; print_r(getRoute('A', 'F', $res)); echo "</pre>"; 

For some reason it turns out like this:

 Array ( [0] => A [id1] => 1 [1] => B [id4] => 4 [2] => B [id2] => 2 [3] => A [id5] => 5 [4] => C [id7] => 7 [5] => D [id6] => 6 [6] => C [id8] => 8 [7] => E [id3] => 3 [8] => A ) 

Although it should be like this:

 Array ( [0] => A [id1] => 1 [1] => B [id4] => 4 [2] => F ) 

P.S. For convenience, the outgoing array from the database table looks like this:

 Array ( [0] => Array ( [id] => 5 [city_1] => C [city_2] => D [distance_km] => 60 [time_mins] => 20 ) [1] => Array ( [id] => 1 [city_1] => A [city_2] => B [distance_km] => 80 [time_mins] => 40 ) [2] => Array ( [id] => 8 [city_1] => E [city_2] => F [distance_km] => 80 [time_mins] => 40 ) [3] => Array ( [id] => 2 [city_1] => A [city_2] => C [distance_km] => 110 [time_mins] => 55 ) [4] => Array ( [id] => 7 [city_1] => D [city_2] => F [distance_km] => 192 [time_mins] => 110 ) [5] => Array ( [id] => 6 [city_1] => C [city_2] => E [distance_km] => 205 [time_mins] => 80 ) [6] => Array ( [id] => 3 [city_1] => A [city_2] => E [distance_km] => 330 [time_mins] => 240 ) [7] => Array ( [id] => 4 [city_1] => B [city_2] => F [distance_km] => 340 [time_mins] => 230 ) )