Consider a picture from which it becomes clear how to find the median of an integer stream:

enter image description here

enter image description here

As we can see, to find the median, it is convenient to split the stream into two parts: Almost half of the elements are in the sorted order in the left part, similarly in the right part. Both parts together form a sorted array: on the left are smaller elements, on the right are large.

The median is calculated as follows: if the number of elements on the left and on the right is equal, the median is equal to the average value of the maximum on the left and minimum on the right (see pictures). If the left one element is larger (the right no longer exists), then the median is equal to the maximum on the left.

The task is conveniently solved with the help of the maximum and minimum heaps. My decision:

#include <iostream> #include <iomanip> #include <queue> using std::priority_queue; using std::vector; using std::greater; using std::less; // Балансирование размеров куч. Если размеры отличаются на 2, // перекидываем элемент из одной кучи в другую void balance_heaps(priority_queue<int, vector<int>, greater<int>>& min, priority_queue<int>& max) { if(max.size() - min.size() == 2) { min.push(max.top()); max.pop(); } if((min.size() - max.size()) == 2) { max.push(min.top()); min.pop(); } } // Прочитанный из входного потока элемент добавляется в одну из двух куч. // При этом производится балансирование размеров void push_new_value(priority_queue<int, vector<int>, greater<int>>& min, priority_queue<int>& max, int value) { max.push(value); balance_heaps(min, max); } // Вычисление медианы среди всех элементов. // Если число элементов четное, медиана вычисляется как средне арифметическое // двух центральных элементов в упорядоченной последовательности double find_median(priority_queue<int, vector<int>, greater<int>>& min, priority_queue<int>& max) { if (min.size() == max.size()) return (max.top() + min.top()) / 2.0; else return double(max.top()); } int main() { std::priority_queue<int> max_heap; std::priority_queue<int, vector<int>, greater<int>> min_heap; int value; while(std::cin >> value) { push_new_value(min_heap, max_heap, value); std::cout << std::setprecision(2) << find_median(min_heap, max_heap) << std::endl; } } 

The algorithm works like this: we add new values ​​only to the maximum heap. Acceptable when the number of elements in the max pile is one more than in the min pile. If there are two more elements to the left, the heaps must be balanced by shifting the excess element into a smaller one.

My algorithm calculates the median as if correctly when the number of elements is even or equal to one, but in other cases it gives an incorrect result. Where am I wrong? The algorithm is so transparent that I do not see where the error is.

I tried to change the algorithm, but fixes like these turned out to be identical transformations of govnokod. Nothing is broken, but the result is the same.

 // Прочитанный из входного потока элемент добавляется в одну из двух куч. // При этом производится балансирование размеров void push_new_value(priority_queue<int, vector<int>, greater<int>>& min, priority_queue<int>& max, int value) { if(max.empty()) max.push(value); else{ if(value >= max.top()) max.push(value); else min.push(value); } balance_heaps(min, max); } // Вычисление медианы среди всех элементов. // Если число элементов четное, медиана вычисляется как средне арифметическое // двух центральных элементов в упорядоченной последовательности double find_median(priority_queue<int, vector<int>, greater<int>>& min, priority_queue<int>& max) { if (min.size() == max.size()) return (max.top() + min.top()) / 2.0; else if(max.size() > min.size()) return max.top(); else return min.top(); } 
  • and in what case can be performed (min.size() - max.size()) == 2 ? - Grundy
  • When in the min-heap by 2 elements more than in the max-heap. Although it seems that this condition will not be met. The elements arrive in the max-heap, and if there are 2 more of them, one is dropped into the min-heap. There will be no overflow in the min-heap. What to do then? - typemoon pm
  • in general, I think balancing should be done only if there are more elements in the maximum queue than in the minimum, that is. put in max, if the number of elements is the same - everything is fine, if not - the element is transferred from max to min - Grundy
  • then it all fits with the description in part If there are 1 more elements to the left ( no more to the right ) - Grundy
  • Then balancing will occur even if there is 1 element more in the max-heap. But this case is admissible: then this one element will be central in the sorted sequence — the median. - typemoon pm

1 answer 1

See, let you add numbers 1,2,3

After 1 - it is in max_heap . 2 is added to it, but is balanced in min_heap . Median - 1.5, OK.

Add 3 - it goes to max_heap , breaking the condition that all the elements on the right are most left. And it turns out that you have a median for 1 2 3 - 3 ... Ie essentially, you break the invariant algorithm .

I, probably, would “symmetrize” the queues, and look where to add - left or right ...

  • one
    My corrected decision with the choice of a suitable heap for the next number just does not work either. ideone.com/WKvtSb A little straightened heap selection: ideone.com/papgI4 It’s as if it’s gotten better. Now I will check the algorithm on the site with this task. - typemoon 1:59 pm
  • Hurray, it began! And how could one design a class more professionally? To begin with, I thought about using an arbitrary type in it. - typemoon
  • one
    And so? ideone.com/1N7KIp Ah, you already found this error :) - Harry