Remove unique values from list

Question

There was a similar topic , but the answer is not clear to me. Why does not remove all values from the list?

data = [1, 2, 3, 4, 5, 6] for i in data: if data.count(i) == 1: data.remove(i) print data

Prints [2,4,6]

Related questions: What is the difference between two for loops: when deleting items during a list traversal and How to find all duplicate items in the list and the number of repetitions?

awesoon awesoon 3,616 one 17 35 · Accepted Answer · 2015-07-05T04:55:16

The most important thing you have to do is never change the size of the array while passing through it.

Let's see how resizing an array affects the logic of a loop:

 In [3]: l = list(range(6)) In [4]: for x in l: ...: print(x) ...: l.remove(x) ...: 0 2 4

Let's look at it through the prism of wonderful ASCII drawings:

 +---+---+---+---+---+---+---+ | 0 | 1 | 2 | 3 | 4 | 5 | 6 | <- l +---+---+---+---+---+---+---+ ^ x

Print x and remove it from the list:

 # print(0) +---+---+---+---+---+---+ | 1 | 2 | 3 | 4 | 5 | 6 | <- l +---+---+---+---+---+---+ ^ x

Let's move on to the next element, as the great Guido van Rossum has bequeathed to us:

 +---+---+---+---+---+---+ | 1 | 2 | 3 | 4 | 5 | 6 | <- l +---+---+---+---+---+---+ ^ x

To fix, repeat the steps: type, delete and go to the next iteration of the cycle:

 # print(2) +---+---+---+---+---+ | 1 | 3 | 4 | 5 | 6 | <- l +---+---+---+---+---+ ^ x (до перехода) +---+---+---+---+---+ | 1 | 3 | 4 | 5 | 6 | <- l +---+---+---+---+---+ ^ x (после перехода)

It is obvious that by changing the size of the array during the iteration on it, another small evil is born, which can lead (and leads) to errors.

The shortest working equivalent of this cycle was presented by @andreymal:

 data = [x for x in data if data.count(x) > 1]

But the solutions presented have one common drawback - they have quadratic complexity.

 In [9]: data = list(range(10000)) In [10]: %timeit [x for x in data if data.count(x) > 1] 1 loops, best of 3: 1.66 s per loop

The standard Python library provides the class Counter , which counts the number of occurrences of each element. Thus, the speed will be linear (strictly speaking, amortized linear):

 In [11]: from collections import Counter In [12]: def f(xs): ....: counter = Counter(xs) ....: return [x for x in xs if counter[x] > 1] ....: In [13]: %timeit f(range(10000)) 100 loops, best of 3: 2.33 ms per loop

andreymal andreymal 8,915 3 25 50 · Answer 2 · 2015-07-04T19:44:12

As we wrote in the previous answer, the array changes, but the index does not change, in fact, an extra offset to the next element is obtained when another element is deleted.

When I'm too lazy to be smart, and the list should be cleaned, I create a copy of the array:

 for i in tuple(data): if data.count(i) == 1: data.remove(i)

(tuple instead of list, because it is said to be more productive)

When I'm not too lazy to wise, I can get a separate list for deleted items:

 rm = [] for i in data: if data.count(i) == 1: rm.append(i) for x in rm: data.remove(i)

When I recall the existence of generator expressions, I write a one-liner variant:

 data = [x for x in data if data.count(x) > 1]

The fourth option I know is given in another answer.

If it is not known in advance that data small, then it is better: non_uniq = [item for item, count in Counter(data).items() if count > 1] (linear algorithm)

Answer 3 · 2015-07-04T19:14:26

if you add a seal

 data = [1, 2, 3, 4, 5, 6] for i in data: print i if data.count(i) == 1: data.remove(i) print data

will get

 1 3 5

apparently in python, when you delete an element, the index remains, and you step through one. those. after deletion, you must again check the item with the same index. The easiest way is to run a descending index

like this, delete everything (my first code on python :), for sure you can be more beautiful)

 data = [1, 2, 3, 4, 5, 6] i = len(data)-1 while i>=0 : if data.count(data[i]) == 1: data.remove(data[i]) i = i-1 print data

Try to change (add?) The answer, based on the assumption that changing the size of the array while passing through it is evil.
In that case, I can recommend the book Dive into Python and the PyCharm educational edition

Remove unique values from list

3 answers 3

More articles:

Remove unique values ​​from list

3 answers 3

More articles:

Remove unique values from list