What approaches are used to solve algorithmic problems on pure SQL, without using procedural extensions? For example, the problem of finding the shortest path in the graph. There are questions - how to implement a graph traversal, how to store intermediate values of calculations, how to set the criterion for stopping the algorithm. As a DBMS, let it be Oracle. It seems that the main approach is to use hierarchical queries, but what data structures to use (strings?) And how to display algorithmic operators of general-purpose languages (conditions, cycles) in SQL is not very clear.
Closed due to the fact that the issue is too general for the participants Akina , Denis Bubnov , rjhdby , Vadizar , ߊߚߤߘ 5 Mar '17 at 8:23 .
Please correct the question so that it describes the specific problem with sufficient detail to determine the appropriate answer. Do not ask a few questions at once. See “How to ask a good question?” For clarification. If the question can be reformulated according to the rules set out in the certificate , edit it .
2 answers
Well, once I got so drunk, let me brag too ... wave algorithm, MySQL dialect.
DROP PROCEDURE IF EXISTS Wave; DELIMITER @@ CREATE PROCEDURE Wave (IN NodeFrom INT, IN NodeTo Int) /* Поиск пути в направленном взвешенном ненормированном графе от NodeFrom до NodeTo волновым алгоритмом. Допустимы изолированные вершины, дольность, контуры, кратные рёбра, мультиграф, листья, петли и пр. Функциональность модифицируется комментариями в коде. С обоими комментариями - ищет наиболее дешёвый путь из не более чем MaxIterations шагов. Если раскомментировать *1 - ищет наиболее дешёвый из максимально коротких длиной не более чем MaxIterations шагов. Если раскомментировать *2 - ищет наиболее дешёвый путь из всех возможных. Если раскомментировать *1 и *2 - ищет наиболее дешёвый из максимально коротких. Если путь не найден - возвращает 0 записей, иначе одну с путём и его стоимостью. Макс. длина пути - VARCHAR(65000) Ожидаемая структура исходных данных: CREATE TABLE Graph ( point1 INT, -- Начало ребра point2 INT, -- Конец ребра ребра weight INT, -- Стоимость ребра, больще нуля ); */ BEGIN /*1 DECLARE Done INT DEFAULT 0;*/ DECLARE MaxIterations INT DEFAULT 100; DROP TEMPORARY TABLE IF EXISTS Routes; CREATE TEMPORARY TABLE Routes(point INT, weight INT, route VARCHAR(65000), PRIMARY KEY(point)) ENGINE = MEMORY; DROP TEMPORARY TABLE IF EXISTS Step; CREATE TEMPORARY TABLE Step(point INT, weight INT, route VARCHAR(65000)) ENGINE = MEMORY; /*2 SELECT COUNT(*)-1 INTO MaxIterations FROM Graph;*/ INSERT INTO Routes(point, weight, route) VALUES (NodeFrom, 0, CAST(NodeFrom AS CHAR)); WHILE /*1 Done = 0 AND*/ MaxIterations > 0 DO TRUNCATE Step; INSERT INTO Step (point, weight, route) SELECT Graph.point2, Routes.weight+Graph.weight, CONCAT(Routes.route, '/', CAST(Graph.point2 AS CHAR)) FROM Routes, Graph WHERE Routes.point = Graph.point1; INSERT IGNORE INTO Routes (point, weight, route) SELECT point, weight, route FROM Step; UPDATE Routes, Step SET Routes.weight = Step.weight, Routes.route = Step.route WHERE Routes.point = Step.point AND Routes.weight > Step.weight; /*1 SELECT COUNT(point) INTO Done FROM Routes WHERE point = NodeTo;*/ SET MaxIterations = MaxIterations - 1; END WHILE; SELECT weight, route FROM Routes WHERE point = NodeTo; DROP TEMPORARY TABLE IF EXISTS Routes; DROP TEMPORARY TABLE IF EXISTS Step; END; @@ DELIMITER ; /* ============================= Демонстрация работы процедуры ============================= */ DROP TABLE IF EXISTS Graph; /* Создание таблицы для хранения направленного графа */ CREATE TABLE Graph ( point1 INT NOT NULL, /* Начало ребра */ point2 INT NOT NULL, /* Конец ребра ребра */ weight INT NOT NULL, /* Стоимость ребра */ PRIMARY KEY (point1, point2) ) ENGINE = MyISAM; DROP PROCEDURE IF EXISTS FillGraph; DELIMITER @@ CREATE PROCEDURE FillGraph (IN VergesCount INT, IN NodesCount INT, IN MaxWeight INT) /* Процедура заполнения таблицы графа случайными данными VergesCount - количество рёбер NodesCount - количество вершин MaxWeight - максимальная стоимость ребра */ BEGIN WHILE VergesCount > 0 DO INSERT IGNORE INTO Graph (point1,point2,weight) SELECT CEILING(NodesCount * RAND()),CEILING(NodesCount * RAND()),CEILING(MaxWeight * RAND()); SET VergesCount = VergesCount - 1; END WHILE; DELETE FROM Graph WHERE point1 = point2; END; @@ DELIMITER ; CALL FillGraph (6000, 2000, 100); /* Тестовое заполнение графа */ SELECT COUNT(*) FROM Graph; /* Просмотр количества сгенерированных рёбер. Меньше заданного - отсев дубликатов и замыканий */ /* Тестовые запуски */ CALL Wave(1,9); CALL Wave(2,6); /* Удаление тестовых объектов */ DROP TABLE IF EXISTS Graph; DROP PROCEDURE IF EXISTS Wave; DROP PROCEDURE IF EXISTS FillGraph; - oneShow off the same algorithm, but implemented as a single SELECT query). - DarkGenius
- Within the MySQL dialect, this is not possible. It does not even have recursive queries. - Akina
- oneAnd within the framework of dialects with recursive queries? - DarkGenius
Once I solved the task of finding the shortest path in a graph just in SQL (true on PostgreSQL, but this is not the point). Here is the body of the main method directly involved in the search:
CREATE FUNCTION prepare_route(s text, d text) RETURNS integer LANGUAGE plpgsql AS $$ DECLARE _seek integer; _t integer; _cnt integer; _prev_cnt integer; _reach bool; BEGIN _seek = (SELECT nextval('seek_id_seq')); INSERT INTO route_seeker(seek_id, node_id, reach_time, prev_node) VALUES(_seek, d, 0, 0); _t = 0; _cnt = 0; _prev_cnt = -1; _reach = false; WHILE NOT _reach AND _cnt - _prev_cnt > 0 LOOP _t = _t + 1; INSERT INTO route_seeker(seek_id, node_id, reach_time, prev_node) SELECT _seek, start, _t, finish FROM edges WHERE finish IN ( SELECT node_id FROM route_seeker WHERE seek_id = _seek AND reach_time = _t-1) ON CONFLICT DO NOTHING; _reach = (SELECT count(node_id)=1 FROM route_seeker WHERE seek_id = _seek AND node_id = s); _prev_cnt = _cnt; _cnt = (SELECT count(node_id) FROM route_seeker); END LOOP; return _seek; END; $$; ALTER FUNCTION public.prepare_route(s text, d text) OWNER TO postgres; -- -- Name: FUNCTION prepare_route(s text, d text); Type: COMMENT; Schema: public; Owner: postgres -- COMMENT ON FUNCTION prepare_route(s text, d text) IS 'Start a new seek operation and find the shortes route from node s to node d. Id of the started seek operation is returned. Prepared route could be then extracted by build_prepared_route() function; or it length may be estimated by get_prepared_route_length() function.'; And here is the formation of an array according to the previously found path:
CREATE FUNCTION build_prepared_route(seek integer, s text, d text) RETURNS text[] LANGUAGE plpgsql AS $$ DECLARE _route text[]; BEGIN _route = (WITH RECURSIVE route AS ( SELECT node_id, prev_node FROM route_seeker WHERE seek_id = seek AND route_seeker.node_id = s UNION ALL SELECT route_seeker.node_id, route_seeker.prev_node FROM route, route_seeker WHERE seek_id = seek AND route.prev_node = route_seeker.node_id) SELECT array_agg(node_id) FROM route); IF _route IS NULL THEN _route = array[]::text[]; END IF; RETURN _route; END; $$; ALTER FUNCTION public.build_prepared_route(seek integer, s text, d text) OWNER TO postgres; -- -- Name: FUNCTION build_prepared_route(seek integer, s text, d text); Type: COMMENT; Schema: public; Owner: postgres -- COMMENT ON FUNCTION build_prepared_route(seek integer, s text, d text) IS 'Build an array representing the shortest path from node s to node d which was prepared in current seek.'; The entire database schema can be found here: https://github.com/Pankraty/graph/blob/master/database%20schema/graph_db.sql . For a perfect solution I do not pretend, but working at least)
- oneThere is one caveat: by condition, the solution should be on pure SQL, without using functions and procedures. - DarkGenius
- Yes, then this solution alone will not work, but perhaps it will help you in your search. I had no such restriction; implemented on the DBMS side to avoid pumping the entire graph to the client, which would be extremely inefficient in the case of a large sparse graph. - Aleksei
- The @Aleksei question is how to solve problems with constraints. - PashaPash ♦
routethat allows you to not fly around A-> B-> A-> B. conditions are usually all in where. The cycle is recursion itself. The recursive part of the query receives at the input of the line from the previous iteration, as in the classical cycle, this state of calculation is left from the previous run - Mike