Hello, there is a file that has been partially sorted before. partially means the file consists, say, of this order of numbers: 1 2 3 3 5 6 6 ... max 1 2 2 4 5 5 ... max and so on, i.e. sorted numbers up to a certain maximum, then sorted again (the number of numbers in these parts is the same, but the numbers themselves may differ) until the end of the file ... help please sort this file. I understand that I need to somehow merge these parts, but how ...

  • Just use a few pointers. We counted one by one, recorded the minimum, shifted the corresponding pointer, counted the new value. And so, until all the values ​​are recorded ... - Harry
  • @Harry, are you kidding me?)) There is an STL) - Majestio
  • @Majestio Well, personally, I took it as a learning task. Especially since we are talking about the file , so if the file is large, then sucking everything into memory can be very problematic. And after all - merge for two ranges - and how many are there in the file? It is said that a lot . - Harry
  • @Harry, aah ... well then just a strictly bubble. Because the file can be so large that the second one of the same length does not fit on the media) - Majestio

2 answers 2

In general, so.
If there is an opportunity to consider all this economy in memory and sort it is better to do it in this way. Faster and easier.
The way to work directly in the file is for training purposes only. Because it turns out to be inhibited, especially with a large number of series - since at each step you need to look for the minimum among all the series ... Globally, the running time of the algorithm is O(Nk) , where k is the number of series, so it should not exceed log N , especially considering the large constant due to the large number of disk operations ...

If this is a real job - take GNU sort and do not suffer :)

Well, if you want to suffer - then here is something that works, but written, as they say, on the knee ...

 #include <vector> #include <iostream> #include <iomanip> #include <fstream> #include <limits> #include <algorithm> using namespace std; void makeFile() { ofstream out("data"); for(int i = 0; i < 30; ++i) { int curval = rand()%100; for(int j = 0; j < 10000+rand()%100; ++j) { out << curval << "\n"; curval += rand()%100; } } } struct Getter { streamoff pos; int val; }; int main(int argc, const char * argv[]) { makeFile(); vector<Getter> ps; ifstream in("data"); int last = numeric_limits<int>::max(), val; while(in >> val) { if (val >= last) { last = val; continue; } last = val; Getter g{in.tellg(), val}; ps.push_back(g); } in.clear(); in.seekg(0); ofstream out("sort"); for(;ps.size();) { auto g = min_element(ps.begin(), ps.end(), [](const Getter&g1, const Getter&g2) { return g1.val < g2.val; }); out << g->val << "\n"; int x; if (in.seekg(g->pos) && (in >> x) && (x >= g->val)) { g->val = x; g->pos = in.tellg(); } else { in.clear(); ps.erase(g); } } } 
  • thanks, I will drive in - xperious
  • Well, I implemented what I said in the commentary - about maintaining a bundle of pointers to sorted pieces in one file ... - Harry
  • Thanks for the idea that you can create additional. structure with a position and value ... very beautiful and useful idea imho - xperious

Option:

 #include <iostream> #include <algorithm> int main () { std::vector<int> A = {1,2,2,3,3,5,6,6,12}; std::vector<int> B = {1,2,2,4,5,5,7}; std::vector<int> R; R.reserve(A.size()+B.size()); std::merge(A.begin(), A.end(), B.begin(), B.end(), std::back_inserter(R)); for(const auto &i:R) std::cout << i << " "; return 0; } 

Conclusion:

 1 1 2 2 2 2 3 3 4 5 5 5 6 6 7 12 

On ideone .

  • thanks, only here you first need to go around the whole file and form these same vectors A, B and so on. farther into the output vector ... this, it turns out, double the use of memory compared to the file size ... is it possible as it is more efficient? - xperious
  • besides, if the vectors are 50, then how? I do not drive in even - xperious
  • Use the "bubble". And if such "vectors" 5000000? What are the overhead costs for indexing? - Majestio
  • did not quite understand: in which place to use the bubble? - xperious
  • Well, if the file is large, there are few resources, there are many groups in the file - sort the entire file by a bubble, and then. - Majestio