There is such a task in front of me:

Tables and data are as follows:

create table t_lang (lang_id number , name varchar2(100)); delete t_lang; insert into t_lang (lang_id,name) values (1,'Chinese'); insert into t_lang (lang_id,name) values (2,'Spanish'); insert into t_lang (lang_id,name) values (4,'English'); insert into t_lang (lang_id,name) values (5,'Russian'); insert into t_lang (lang_id,name) values (6,'German'); insert into t_lang (lang_id,name) values (7,'French'); insert into t_lang (lang_id,name) values (8,'Italian'); insert into t_lang (lang_id,name) values (9,'Japanese'); insert into t_lang (lang_id,name) values (10,'Portuguese'); commit; create table t_person (person_id number , name varchar2(100)); delete t_person; insert into t_person (person_id,name) values (1,'Piter'); insert into t_person (person_id,name) values (2,'Anna'); insert into t_person (person_id,name) values (3,'Olga'); insert into t_person (person_id,name) values (4,'Max'); insert into t_person (person_id,name) values (5,'Ivan'); insert into t_person (person_id,name) values (6,'Sam'); insert into t_person (person_id,name) values (7,'Bill'); commit; create table t_pers_lang (person_id number , lang_id number); delete t_pers_lang; insert into t_pers_lang (person_id,lang_id) values (1,4); insert into t_pers_lang (person_id,lang_id) values (1,5); insert into t_pers_lang (person_id,lang_id) values (2,6); insert into t_pers_lang (person_id,lang_id) values (2,7); insert into t_pers_lang (person_id,lang_id) values (3,9); insert into t_pers_lang (person_id,lang_id) values (3,6); insert into t_pers_lang (person_id,lang_id) values (4,4); insert into t_pers_lang (person_id,lang_id) values (4,7); insert into t_pers_lang (person_id,lang_id) values (5,1); insert into t_pers_lang (person_id,lang_id) values (6,10); insert into t_pers_lang (person_id,lang_id) values (6,8); insert into t_pers_lang (person_id,lang_id) values (7,2); insert into t_pers_lang (person_id,lang_id) values (7,8); insert into t_pers_lang (person_id,lang_id) values (7,1); commit; 

Wrote a query that displays the desired result ...

 select Группа, listagg(f2, ', ') within group (order by f2) Участники, listagg(f3, ', ') within group (order by f3) Языки from ( select NTILE(2) OVER (ORDER BY pl.person_id) Группа, (case when row_number() over (partition by p.person_id order by l.lang_id) = 1 then p.name end) f2, (case when row_number() over (partition by l.lang_id order by p.person_id) = 1 then l.name end) f3 from t_lang l, t_person p, t_pers_lang pl where l.lang_id=pl.lang_id and p.person_id=pl.person_id ) group by Группа; 

... but there is one BUT!

In a subquery, I use NTILE (2), which simply breaks the entire data set equally, which is not correct. The problem is solved for a particular case.

If you change the input data (for example, replace the key of any participant with another), the query returns an incorrect result.

Question: how can I still divide the participants into groups, between which communication is not possible?

  • Your task reminded me of this (carefully, very many English letters). I do not remember the details already, I remember a lot of communication with the author in Skype. As I recall, the task was identical - break up the set into disjoint groups. - Dmitriy
  • It seems that I even know how to do it in one request (I could not at that time), but there you can dislocate the brain. - Dmitriy
  • one
    So far it seems that to solve this problem one cannot do without recution ( with ) or trees ( connect by ). With them, the problem is solved quite easily - Viktorov

1 answer 1

I got something monstrous and, I suspect, voracious on large source data. If anyone can better, I will also be grateful:

 with t (name, lang) as ( select 'Piter', 'English' from dual union all select 'Piter', 'Russian' from dual union all select 'Anna', 'German' from dual union all select 'Anna', 'French' from dual union all select 'Olga', 'Japanese' from dual union all select 'Olga', 'German' from dual union all select 'Max', 'English' from dual union all select 'Max', 'French' from dual union all select 'Ivan', 'Chinese' from dual union all select 'Sam', 'Portuguese' from dual union all select 'Sam', 'Italian' from dual union all select 'Bill', 'Spanish' from dual union all select 'Bill', 'Italian' from dual union all select 'Bill', 'Chinese' from dual), s (name, lang, root_name) as ( select name, lang, connect_by_root name root_name from t connect by nocycle (prior lang = lang and prior name <> name) or (prior lang <> lang and prior name = name)) select distinct t1.names, t2.langs from (select root_name, listagg(name, ',') within group (order by name) names from (select distinct root_name, name from s) group by root_name) t1 join (select root_name, listagg(lang, ',') within group (order by lang) langs from (select distinct root_name, lang from s) group by root_name) t2 on t1.root_name = t2.root_name 

The idea is that a tree is built for each person (the connection condition is the same language, but a different name, or the same name, but a different language), for the tree we memorize the root, and then for each root we make a list_agg separate names, and separately languages. The trick is that the names and languages ​​are sorted, which means the lists will be the same for different roots if the sets are the same. This means that they are in the same group. I hope, clearly explained. Result:

 NAMES LANGS ------------------------------------------------------------ Anna,Max,Olga,Piter English,French,German,Japanese,Russian Bill,Ivan,Sam Chinese,Italian,Portuguese,Spanish 
  • This does not affect the general idea, but in this version everything will break if you have 2 people with the same name in the table. Still need to connect by ID. I think this is an important methodological moment - Viktorov
  • Oh, yes, of course. I simply combined the data in order not to bother with the joins and not complicate the initial data, which are wrapped in the first with . - Dmitriy