remove duplicate from large mysql database

Question

Hello! there is a mysql nameplate with 60 million entries, it has fields: ID, keyword (varchar 200), cat. It is necessary to leave only records with unique values of the field "keyword", and duplicates to remove. I found the following solution in the answers:

DELETE FROM keywords USING keywords, keywords t1 WHERE keywords.id> t1.id AND keywords.keyword = t1.keyword

I started mysql through the console, it has been hanging for several hours and I don’t know how long it will hang .. What can you advise?

And how many lines do you expect to receive after the request?
If there are few duplicates, then the index across the keyword field should help.
If there are a lot of duplicates and as a result there should not be many records, it would probably be more efficient to create another table and insert unique values into it and then delete the original and rename the new one
I assume that there are no more than 10-15 million unique entries there. The keyword field is indexed.
And how to insert into the new table the unique values from the existing one?
висит уже несколько часов и не знаю, сколько еще будет висеть.. ah, for a long time ... well, this is how many pairs you need to check - even if the indexes are used ... you cant wait for the process.
It is much faster to copy data to a new table, where the index for this field is unique, and duplicates are ignored.
If you need to remove 80% of the table, do not hesitate - kill the session and do as @Mike advises.

remove duplicate from large mysql database

0

More articles: