There is such a status plate of elephants, that is, an elephant can have several statuses of its location.

status: id int auto_increment pk slon_id int status varchar date datetime 

When a software bug (or rather a flaw) appeared duplicate records of the form

 1 1 Розовый слон в Африке 18.11.2016... 2 1 Розовый слон в Африке 18.11.2016... 3 1 розовый слон уже в Зимбабве 19.11.2016... 4 2 Зеленый слон в Европе 15.11.2016... 

Interested in SQL solution. Is it possible to somehow remove duplicate records? Is it possible to remove duplicate records leaving one by one in order not to lose history? (That is, if there are several entries about Africa and the Pink Elephant - delete all but one).

  • duplicates on the status and identifier of the elephant, in this case, lines 1 and 2. - Jean-Claude

2 answers 2

Algorithm of actions:

  1. understand the criteria by which records should be considered identical
  2. write a query to search for duplicate data sets (you can not rows, only data sets)
  3. understand how to decide which of the entries should be left
  4. accordingly, correct the request so that it displays all duplicate records, except the one that should be left
  5. write the query delete where id in (subquery) .

The result is something like this, if we take the uniqfield field, which should be unique, leave a string with a minimum id :

 delete from tablename where id in ( select id from ( select id from tablename join ( select min(id) as firstdup, uniqfield from tablename group by uniqfield ) duplicates using(uniqfield) where id != firstdup ) subqueryhack ) 

For a combination of two fields so that it can be seen that changes

 delete from tablename where id in ( select id from ( select id from tablename join ( select min(id) as firstdup, uniqfield1, uniqfield2 from tablename group by uniqfield1, uniqfield2 ) duplicates using(uniqfield1, uniqfield2) where id != firstdup ) subqueryhack ) 

delete in mysql will not allow you to directly delete from the table that the subquery reads, but it costs, if necessary, another subquery.

If it does not matter which lines to leave, and which ones to remove and allow the version of the DBMS, then before mysql 5.7.4 you could just hang a unique index indicating ignore

 ALTER IGNORE TABLE mytable ADD UNIQUE INDEX myindex (A, B, C, D); 

That will remove all duplicates on these fields, except for a single line. In current versions, the ignore behavior is removed and causes an error. He didn’t understand why he was deleted, most likely due to unobvious behavior, which line will be deleted and the general rate of returning to the SQL standard.

  • Thanks, everything works on the test table, but I couldn’t put it into practice (as I understood later), since different elephants can be in the same place at the same time, and the name of the elephant is not in status, that is, the status is слон в Африке 18.11.2016 may be different. - Jean-Claude
  • So the id is the elephant? Consider the unique combination of fields. - Shallow
  • There is an elephant id, I have not yet mastered requests of this kind. To do it in and where conditions? - Jean-Claude
  • select, group and then join condition using. That's all the difference, I added an example of a query with a search for duplicates in a pair of fields - Small
  • super, everything works. Senks. - Jean-Claude

This option should work. TESTTEST needs to be replaced with your own table name. This request will leave one entry for each slon_id with a maximum date.

 DELETE FROM TESTTEST WHERE id NOT IN ( SELECT MAX(id) FROM (SELECT t.slon_id, MAX(t.[date]) AS [date] FROM TESTTEST t GROUP BY t.slon_id) t JOIN TESTTEST t2 ON t.slon_id = t2.slon_id AND t.[date] = t2.[date] GROUP BY t2.id ) 
  • pardonte, not plowing, unlike the answer of another person. - Jean-Claude
  • What error does it write? - progmb
  • there is no error, it simply finds no duplicate, but simply the maximum ID from the table. - Jean-Claude
  • Yes, I misunderstood a bit the problem. It deletes all entries for this elephant, except for entries with a maximum date and a maximum id. The maximum id is chosen to for one date there can be several entries for the same elephant. - progmb