Algorithm of actions:
- understand the criteria by which records should be considered identical
- write a query to search for duplicate data sets (you can not rows, only data sets)
- understand how to decide which of the entries should be left
- accordingly, correct the request so that it displays all duplicate records, except the one that should be left
- write the query
delete where id in (subquery) .
The result is something like this, if we take the uniqfield field, which should be unique, leave a string with a minimum id :
delete from tablename where id in ( select id from ( select id from tablename join ( select min(id) as firstdup, uniqfield from tablename group by uniqfield ) duplicates using(uniqfield) where id != firstdup ) subqueryhack )
For a combination of two fields so that it can be seen that changes
delete from tablename where id in ( select id from ( select id from tablename join ( select min(id) as firstdup, uniqfield1, uniqfield2 from tablename group by uniqfield1, uniqfield2 ) duplicates using(uniqfield1, uniqfield2) where id != firstdup ) subqueryhack )
delete in mysql will not allow you to directly delete from the table that the subquery reads, but it costs, if necessary, another subquery.
If it does not matter which lines to leave, and which ones to remove and allow the version of the DBMS, then before mysql 5.7.4 you could just hang a unique index indicating ignore
ALTER IGNORE TABLE mytable ADD UNIQUE INDEX myindex (A, B, C, D);
That will remove all duplicates on these fields, except for a single line. In current versions, the ignore behavior is removed and causes an error. He didn’t understand why he was deleted, most likely due to unobvious behavior, which line will be deleted and the general rate of returning to the SQL standard.