Remove duplicates from table

Question

There is a table

How do I remove duplicate sql query?

This is cool, I watched. The problem is that no field has a primary key - gudfar
And what have the primary key? There are many options. Similarly, not one of them suits you? - Viktorov

Viktorov Viktorov 5.474 4 gold marks 20 silver marks 47 bronze marks · Accepted Answer · 2016-07-06T12:06:12

1) Through another table

CREATE TEMPORARY TABLE tmp_tab AS SELECT DISTINCT * FROM your_table; DELETE FROM your_table; INSERT INTO your_table SELECT * FROM tmp_tab; DROP TABLE tmp_tab;

2) Adding an index. I personally have not tried it myself, but they say it works. A unique index is added, and duplicates are deleted. Actual for MySQL

 ALTER IGNORE TABLE your_table ADD UNIQUE INDEX(id_category, id_product, position);

Viktorov

5.474 4 gold marks 20 silver marks 47 bronze marks

Completely re-create the table to remove duplicates? mda ... - pegoopik
@pegoopik as far as I know, in MySql this is the standard approach. If you have a better solution, share - Viktorov
You can create a column with unique values. Using it, remove duplicates, then delete the column. Which is also quite resource intensive. You can try to dig in the direction of accumulation in the variable yet. - pegoopik
@pegoopik, that is, we first add a column, then fill it with a simple request (we don’t have analytical functions), then delete the extra lines, then delete the column. It looks if it is better than re-creation, then not much. The second proposed my version is one of the fastest for MySql, but it creates an index for all fields, which is not good. However, it is clearly faster than the option you offer. Summarize. You criticize my decision, but you never offered anything better. - Viktorov
>> we fill it with a simple request (we do not have analytical functions) << Yes? Why not simple. Three lines in formatted form (see my answer) - pegoopik

|

Answer 2 · 2016-07-06T16:27:56

PostgreSQL

You can solve the problem with a single query with CTE (where T is the source table):

  with td as (delete from T returning *), tt as (select row_number() over(partition by id_category,id_product,position order by id_category,id_product,position) num, * from td) insert into T select id_category,id_product,position from tt where num=1;

I understand correctly that in this case all records will be deleted, and then unique ones will be inserted?
By the way, the solution in the style of MS SQL in PG will not work.

Answer 3 · 2016-07-07T07:23:02

MySQL:

You can create a column with unique values. Using it, remove duplicates, then delete a column (or better leave and hang a unique index on it). Which is quite resource intensive, but an option.

To create a column and fill it with natural numbers, you can:

 ALTER TABLE Test ADD Id INT; UPDATE Test SET Id = @I := @I + 1 /*тут можно задать нужную сортировку при желании, я добавил по A, B*/ ORDER BY A, B, (SELECT @I := 0)

As a result, a column id appears in the Test table, filled with a numeric sequence sorted by columns A, B.

UPD: There is a somewhat extravagant way :) First, mark the lines for deletion, then delete. Using again the accumulation in the variable.

For clarity, I will show all the scripts in working form.

Create a sign and fill:

 CREATE TABLE TEST_DUPLICATE( A VARCHAR(20), B VARCHAR(20) ); INSERT TEST_DUPLICATE SELECT 'AAA', 'BBB'; INSERT TEST_DUPLICATE SELECT 'AAA', 'BBB'; INSERT TEST_DUPLICATE SELECT 'BBB', 'BBB'; INSERT TEST_DUPLICATE SELECT 'AAA', 'AAA'; INSERT TEST_DUPLICATE SELECT 'BBB', 'BBB'; INSERT TEST_DUPLICATE SELECT 'AAA', 'AAA'; INSERT TEST_DUPLICATE SELECT 'AAA', 'BBB'; SELECT * FROM TEST_DUPLICATE;

Here are its contents:

 AAA BBB AAA BBB BBB BBB AAA AAA BBB BBB AAA AAA AAA BBB

Now mark duplicates in the field B with the string DUPLICATED

 UPDATE TEST_DUPLICATE SET B = CONCAT( CASE WHEN A=@A AND B=@B THEN 'DUPLICATED' ELSE B END , /*тут фейковое слагаемое, просто чтобы изменить значения @A и @B*/ CASE WHEN CONCAT((@A:=A),(@B:=B)) >= '' THEN '' END) ORDER BY A, B, (SELECT @A:=''), (SELECT @B:='') ; SELECT * FROM TEST_DUPLICATE;

Now the table contents:

 AAA BBB AAA DUPLICATED BBB BBB AAA AAA BBB DUPLICATED AAA DUPLICATED AAA DUPLICATED

Remove the marked lines:

 DELETE FROM TEST_DUPLICATE WHERE B = 'DUPLICATED'; SELECT * FROM TEST_DUPLICATE;

got what they wanted:

 AAA BBB BBB BBB AAA AAA

There is a certain criticism of such a decision. But I have described it for the sake of simplicity. If desired, the theme can be developed and used

Addition: All the same can be done on other DBMS, replacing the accumulation in a variable analytical functions ROW_NUMBER, LEAD. In other DBMS this will look "nicer".

Remove duplicates from table

3 answers 3

More articles: