Remove duplicates from a vector without sorting

Question

How to remove duplicates from a vector without sorting in order to preserve the order of objects in a vector?

What optimization requirements are there in terms of memory, speed?

Vladimir Gamalyan 5,702 3 20 49 · Accepted Answer · 2016-10-02T08:53:09

My solution. It seems to me that it is simpler and clearer than the answer @gbg

[unordered_]set <type> s; for (const auto &z : a) if (s.insert(a)) out.push_back(a);

We simply check if there is an element in the set, if not - then it is not duplicate and should be added to the answer, otherwise - a duplicate. Remains 1 element of all duplicates.

Pavel Mayorov Pavel Mayorov 48.9k five 51 110 · Answer 2 · 2016-10-02T06:31:40

Create a new vector based on the old one, but add the original index to the elements.
Sort intermediate vector by value
Delete non-unique elements
Sort the intermediate vector by index
Copy the results back

if it were possible to choose several correct answers, I would choose this answer.

Answer 3 · 2016-10-02T06:34:24

As one of the options, first download everything to std::set :

 std::set<> s(a.begin(),a.end());

Then bang out of the array (and at the same time from the set ) everything that is not in the set , and do it faster, creating a new array.

 out.reserve(s.size); for(const auto& i : input) { const auto pos = s.find(i); if(pos != s.end()) // если мы ранее такое не встречали, тащим в выходной массив { out.push_back(i); s.erase(pos); // вот в чем цимес, из сета мы его убираем, и при новой встрече копировать не станем. } }

The version is much better (@pavel):

 for (auto z: a) if (s.insert(z)) out.push_back(z)

I understood from the question that if an item contains a duplicate, you only need to bang a duplicate, and not both items together.
@MalovVladimir, everything is fine, if you need to save some of the duplicates, then we do it in the algorithm.
I asked a question in the hope that there is a ready-made function
It remains to write about the removal of an element from set in the description of the algorithm.

αλεχολυτ αλεχολυτ 21.1k 9 39 92 · Answer 4 · 2016-10-02T09:35:12

The difficulty is that “correctly delete” the concept is ambiguous. All previously proposed answers are based on time optimization, i.e. creation of an additional container, its sorting (albeit implicitly) and further transfer to the original vector.

But there is another option: do not use another container, but simply run over the existing one and delete all the same. Algorithmic complexity will certainly increase, but there will be a gain from memory.

Code example:

 void dropDup(std::vector<int>& v) { for( auto base = v.begin(); base != v.end(); ++base ) { for( auto it = base + 1; it != v.end(); ) { if( *base == *it ) { it = v.erase(it); } else { ++it; } } } }

I agree, but to be honest, I asked this question with the hope that there is some kind of ready-made function out of the box, something like unique without sorting , and there would be an optimization in the kit.
More complex algorithms are probably implemented in some specialized libraries.

Remove duplicates from a vector without sorting

4 answers 4

More articles: