There is a table for reference

DB table - words

word | description --------+----------------------------------------------------------------------------- ΠΊΠΎΡ‚ | Π‘Π°ΠΌΠ΅Ρ† кошки ΠΊΠΎΡ‚ | Π‘Π°ΠΌΠ΅Ρ† кошки сСмСйства ΠΊΠΎΡˆΠ°Ρ‡ΡŒΠΈΡ… отряда Ρ…ΠΈΡ‰Π½Ρ‹Ρ… ΠΊΠΎΡ‚ | Π”ΠΎΠΌΠ°ΡˆΠ½Π΅Π΅ ΠΆΠΈΠ²ΠΎΡ‚Π½ΠΎΠ΅ являСтся ΠΎΠ΄Π½ΠΈΠΌ ΠΈΠ· ΠΏΠΎΠ΄Π²ΠΈΠ΄ΠΎΠ² Π΄ΠΈΠΊΠΎΠΉ кошки Π±Π΅Π³Π΅ΠΌΠΎΡ‚ | ΠšΡ€ΡƒΠΏΠ½ΠΎΠ΅ ΠΏΠ°Ρ€Π½ΠΎΠΊΠΎΠΏΡ‹Ρ‚Π½ΠΎΠ΅ ΠΌΠ»Π΅ΠΊΠΎΠΏΠΈΡ‚Π°ΡŽΡ‰Π΅Π΅ Π±Π΅Π³Π΅ΠΌΠΎΡ‚ | ΠšΡ€ΡƒΠΏΠ½ΠΎΠ΅ ΠΏΠ°Ρ€Π½ΠΎΠΊΠΎΠΏΡ‹Ρ‚Π½ΠΎΠ΅ ΠΌΠ»Π΅ΠΊΠΎΠΏΠΈΡ‚Π°ΡŽΡ‰Π΅Π΅ с массивным Ρ‚ΡƒΠ»ΠΎΠ²ΠΈΡ‰Π΅ΠΌ ΠΈ ΠΊΠΎΡ€ΠΎΡ‚ΠΊΠΈΠΌΠΈ Π½ΠΎΠ³Π°ΠΌΠΈ 

For each term, it is necessary to delete all partial duplication of definitions from the reference book, leaving only the most complete non-repeating

Expected result after the request

 word | description --------+----------------------------------------------------------------------------- ΠΊΠΎΡ‚ | Π‘Π°ΠΌΠ΅Ρ† кошки сСмСйства ΠΊΠΎΡˆΠ°Ρ‡ΡŒΠΈΡ… отряда Ρ…ΠΈΡ‰Π½Ρ‹Ρ… ΠΊΠΎΡ‚ | Π”ΠΎΠΌΠ°ΡˆΠ½Π΅Π΅ ΠΆΠΈΠ²ΠΎΡ‚Π½ΠΎΠ΅ являСтся ΠΎΠ΄Π½ΠΈΠΌ ΠΈΠ· ΠΏΠΎΠ΄Π²ΠΈΠ΄ΠΎΠ² Π΄ΠΈΠΊΠΎΠΉ кошки Π±Π΅Π³Π΅ΠΌΠΎΡ‚ | ΠšΡ€ΡƒΠΏΠ½ΠΎΠ΅ ΠΏΠ°Ρ€Π½ΠΎΠΊΠΎΠΏΡ‹Ρ‚Π½ΠΎΠ΅ ΠΌΠ»Π΅ΠΊΠΎΠΏΠΈΡ‚Π°ΡŽΡ‰Π΅Π΅ с массивным Ρ‚ΡƒΠ»ΠΎΠ²ΠΈΡ‰Π΅ΠΌ ΠΈ ΠΊΠΎΡ€ΠΎΡ‚ΠΊΠΈΠΌΠΈ Π½ΠΎΠ³Π°ΠΌΠΈ 

PS - please note

1) definitions can also be long, one thousand characters or more

2) the number of records in the table - several hundred thousand

I try this:

 select word, distinct(description) from words 

What am I doing wrong and how to write the right query?

  • yes, while we take without distinction by letter - hold me seven
  • Is there a unique key in the table? Because now that would not select records because they are equal to themselves, it is necessary to compare the description to inequality. But in this case full duplicates will not be found. And if you somehow think of how to choose more and full duplicates, it will be extremely difficult to remove them - Mike
  • @Mike There are no keys and we will not add, only two columns of the word and description - hold me seven
  • Okay. But can there still be situations of complete coincidence of the word and description? those. if the entry "cat = male cat" will meet more than once? - Mike
  • And by the way all the same, you need to get longer names or execute delete, which will remove the extra entries - Mike

2 answers 2

 select distinct A.* from words A left join words B on B.description like concat('%',A.description,'%') and A.word=B.word and A.description!=B.description where B.word is null 

distinct needed in case there are full doubles in the table. But it can slow down the work of the request, if there are certainly no duplicates, then it can be removed. The presence of a primary key in the table is also very welcome, then in the table join condition you can check for the inequality of this primary key, which is more reliable than checking for inequality of description and would help to find complete duplicates.

In addition, in this query, the entry of a string is checked even in the middle of another string, and not just at the beginning. If you only need to check the beginning of the line, remove the first % from concat()

left join tries to find for each record A such records B where there would be a substring entry ( like '%строка%' ). Those. he will find entries for which there is a longer description. The condition where B.word is null leaves in the selection only those records for which the left join did not find anything.

  • Super! Thank you very much! If it will not be difficult for you, please add your request with comments, I want to understand and understand how it works - hold me seven
  • @holdmeseven posted. I hope, clearly formulated - Mike
  • $ Mike will try tomorrow to deal with each line, with your permission, if I have any questions, I will write. Thanks again! - hold me seven
  • one
    @holdmeseven Yes, with every line, request. experiment (on cats :)), make it simple * and not just A.* , remove where see what happened - Mike
  • $ Mike All take note! I will train on cats!)) - hold me seven

Request

 SELECT t1.* FROM table t1, table t2 WHERE t2.description LIKE CONCAT(t1.description, '%') 

will give all the SHORT descriptions. It remains to remove them ...