Remove duplicates from DB

Question

The database got duplicate records. It is necessary to clear the first copy (the key field is smaller than that of the subsequent copy). Example:

uidl id date spam 533 590 2011-08-09 00:00:02 0 535 AAA 2011-08-09 00:11:12 0 536 590 2011-08-09 00:00:02 1 540 AAA 2011-08-09 00:11:12 1

The duplicate criteria are the same id and date .

Now I use this query:

 CREATE TABLE tmp SELECT * FROM `testing` WHERE 1 GROUP BY date,id

It creates a table of unique (recent) entries. But this is not very convenient, I would just like to delete each first entry from the pair.

Depending on the DBMS, there are several ways to remove duplicates: How to remove duplicate rows from a table?
judging by the backticks - topicaster was looking for a solution under mysql

PashaPash ♦ 42.4k 10 golden marks 64 silver marks 130 bronze marks · Answer 1 · 2011-08-17T07:58:39

uidl id date spam

 delete from `testing` USING `testing`, `testing` as `tmp` WHERE (`testing`.`uidl` <`tmp`.`uidl` ) AND (`testing`.`date`=`tmp`.`date`) AND (`testing`.`id` = `tmp`.`id` )

EDIT: added check for a unique field. So, for reliability.

Answer 2 · 2011-08-17T08:02:25

 delete from 'table_name' where 'id' in ( SELECT DATA , count( * ) FROM 'table_name' GROUP BY 'uidl' HAVING count( * ) >1 LIMIT 0 , 30);

Kenpachi

1,171 7 silver marks 24 bronze marks

+1 LIMIT only needs to be cleaned up, which mayadmin leaves ... - knes
> LIMIT only needs to be cleaned up, which mayadmin leaves ... Is this a field size or something? Honestly, I did not understand why count (*)> 1 LIMIT 0, 30 - moden
one
count (*)> 1 Reports that more than one of the fields found are grouped. LIMIT start, quantity - says that it is necessary to delete everything from all selected, starting from the "beginning" and until the "quantity" is typed. The size of the field has nothing to do with it ... - knes
in this form does not work - grouping by uidl does not make sense at all, since the field is unique. date will not be selected here, since it is not in the group by - renegator

|

Nicolas Chabanovsky ♦ 38.1k 54 gold marks 222 silver marks 438 bronze marks · Answer 3 · 2011-08-17T09:10:54

 delete from testing where uidl in ( select t.uidl from testing t join ( select id, date, max(uidl) as md from testing group by id, date) m on m.id = t.id and m.date = t.date and m.md > t.uidl)

PashaPash ♦ 42.4k 10 golden marks 64 silver marks 130 bronze marks · Answer 4 · 2011-08-23T21:19:50

 WITH Doubles AS ( select uidl, id, date, spam , row_number() over(partition by id, date order by uidl desc) num from Testing ) Delete Doubles Where num > 1

big-town big-town one · Answer 5 · 2016-09-28T19:58:50

Not exactly the topic, but I think it will also be useful. Removes all duplicates and leaves a unique entry with the earliest date.

 DO $$ DECLARE varid numeric; vardate timestamp; BEGIN FOR varid,vardate IN select id,min(date) from tmp group by id having count(*)>1 LOOP RAISE NOTICE '% % ',varid,vardate; DELETE FROM tmp WHERE id=varid AND date<>vardate; END LOOP; END $$;

Remove duplicates from DB

5 answers 5

More articles: