Hello! there is a mysql nameplate with 60 million entries, it has fields: ID, keyword (varchar 200), cat. It is necessary to leave only records with unique values ​​of the field "keyword", and duplicates to remove. I found the following solution in the answers:

DELETE FROM keywords USING keywords, keywords t1 WHERE keywords.id> t1.id AND keywords.keyword = t1.keyword

I started mysql through the console, it has been hanging for several hours and I don’t know how long it will hang .. What can you advise?

  • And how many lines do you expect to receive after the request? If there are few duplicates, then the index across the keyword field should help. If there are a lot of duplicates and as a result there should not be many records, it would probably be more efficient to create another table and insert unique values ​​into it and then delete the original and rename the new one - Mike
  • I assume that there are no more than 10-15 million unique entries there. The keyword field is indexed. And how to insert into the new table the unique values ​​from the existing one? - thetur
  • select through distinct ? - teran
  • висит уже несколько часов и не знаю, сколько еще будет висеть.. ah, for a long time ... well, this is how many pairs you need to check - even if the indexes are used ... you cant wait for the process. It is much faster to copy data to a new table, where the index for this field is unique, and duplicates are ignored. - Akina
  • If you need to remove 80% of the table, do not hesitate - kill the session and do as @Mike advises. - 0xdb

0