Yamsort yet another stable merge

Some time ago we discussed the issue of Sorting in Java here and concluded that Java uses the merge sorting algorithm because it is primarily stable (stable) sorting. There I expressed the idea of ​​trying to implement merge sorting (despite the warning on the wiki about the high complexity of the algorithm) without allocating additional memory of O (n).

How can this be implemented?

  • one
    "5-10% of the size of the array being sorted" is O (n). mergesort can be done on site - andrybak
  • one
    @Spectre, unfortunately failed to state briefly. If there are many who want to (or HashCode will advise it that way), you can radically reduce it by leaving a link to code.google.com. All materials are there. - @Dex, but now I'll try. I will publish in UPD - @Gorets, something hashCode is more pleasant to me, more specifically here or what? About not a place - to a certain extent I agree, the HashCode does not position itself as a place for discussion. But, I decided to keep on what happened as a result of discussing one of my questions here. - avp
  • 2
    Well, it began ... It seems soon it’s time to join michael.richter.name/blogs/… - avp
  • one
    Something a lot of "closers" appeared. Don't you think that it may still be interesting for someone other than you (and to some extent me)? / (you won't find a similar sorting method in the literature) - avp
  • 2
    @avp It seems to me that everything is fine in your question, except for only one, it is decorated a little bit by the SO rule (from this and all the excitement). The only thing I would like to ask you is - please transfer the entire body of the question in response, with the exception of "Yamsort yet another stable merge sort with small auxiliary memory". - Nicolas Chabanovsky

1 answer 1

Dear community! I apologize in advance for the volume of the topic under discussion ( in fact, this text was written in 2012 as part of the question, but in connection with the new rules of the site, on the recommendation of the administration, was altered in response ).

For example, like this. C code here

Here I propose to discuss the implementation of the C language (compiled in C ++ too). I called this project yamsort (yet another merge sort) and posted it on code.google.com called yamsort ( here's the direct link ). In Downloads are .tar.gz, and in Source, respectively, the files in text form.

Description in Readme.txt file, implementation: in yamsort.c function in qsort () style, in yamsort_tmpl.h in C style ( idea from here ). Using the C template function in the files int_yamsort.c and data_yamsort.c.

Example

// data_yamsort.c struct data { int key; double d; }; #define SORT_NAME data #define SORT_TYPE struct data #define SORT_CMP(x,y) ((x).key < (y).key ? -1 : ((x).key == (y).key ? 0 : 1)) /* make data_yamsort(struct data a[], size_t n) function */ #include "yamsort_tmpl.h" // препроцессор создаст функцию data_yamsort(struct data a[], size_t n) 

Algorithm Description

The algorithm provides stable (stable) sorting when using a small amount (5-10% of the size of the sorted array) of dynamically allocated memory. The running time is proportional to N * log (N), as is usual in merge sorting algorithms, which mainly use a significant amount of auxiliary memory in the merge phase.

We use the "standard" recursive merge sorting algorithm with array division in half at each step. When a segment of an array becomes small (say, shorter than 31), we sort it by inserts.

Ordered arrays are arranged sequentially in memory before merging. The idea of ​​the merge algorithm implemented here is to place elements of one sequence in a “queue”, which, if possible, is placed in place of elements of another sequence already inserted into the result.

We merge the left and right parts in place of the left part. The elements of the left side are sequentially moved to the queue, which consists of dynamically allocated blocks of size sqrt (N). If possible, the block is placed on a free space in the right part, otherwise it is allocated with the malloc () function. Once allocated, the malloc () blocks, when released from the queue, are put on the free list and reused. Released queue blocks placed in an array are added to their free list.

The blocks of the queue in place of the elements of the right side are allocated sequentially. If the next element to be added to the sorted sequence should go to the first of the queue blocks located on the right side, then this block moves to the free space on the right side of the array or (there is no empty space) to one of the malloc blocks.

More on queue manipulation.

Queue block - header and further data array

 // Queue block header typedef struct { int fdata, ffree; // индексы первого занятого и свободного элемента data[] struct qblock *next, *prev; } QBLOCK_HDR; // Queue block with data array typedef struct qblock { QBLOCK_HDR h; SORT_TYPE data[1]; } Q_BLOCK; // Блок управления очередью и свободными блоками typedef struct { Q_BLOCK *head, *tail, *flist, // список свободных блоков, полученных malloc() *aflist; // двусвязный список свободных блоков в правой части массива void *ffree, // начало свободного места в правой части *falloc, // первый блок очереди, размещенный в правой части *afirst, // начало всего массива *alast; // его конец int bsize, // количество элементов, размещаемых в блоке очереди totbsz; // размер блока очереди в байтах } DATA_QUEUE; 

When a new item is added to the queue: While there is space in the last block of the queue, we place it in it (in data []). Otherwise, we get a new block and place it in the first data [] element. A new block is obtained as follows:

  1. From aflist;

  2. We place this block in the free part of the data array; This is the place between the queue blocks already placed in the array and the elements of the right side, not yet moved to the sorted area. Adjust ffree.

  3. From flist; This is a list of previously allocated malloc () blocks that currently do not contain queue data.

  4. Call malloc ();

When fetching an item from a queue to put it in a sorted sequence, we check to see if the queue block is free. If an empty block is the only one in the queue, then it remains in it and can be refilled. Otherwise, if the block was allocated malloc (), then we put it in the flist list. We mark the block from the array area as free and place it in the aflist.

When entering an item into a sorted sequence that has already reached the right side of the array, we check if this place is not started with any queue block or block from the aflist list. If this is an aflist list block, then it is simply excluded from the list. If it is a queue block, it is copied to the free space. First of all, we try to copy it into the free zone on the right-hand side of the array, similar to paragraph 2) in the algorithm for placing an item in the queue. Otherwise, we copy this block into a block from the flist list, and if flist is empty, we call malloc ().

In .tar.gz there are programs of several sorts and a program to call them for measurements (./measure). Readme.txt in more detail about this.


 Оценки сортировок: N - размер массива time aux memory yamsort O(N*log(N)) ? O(sqrt(N)) stable tim_sort O(N*log(N)) O(N/3) ? stable mmsort O(N*log(N)) O(N/2) stable qsort O(N*log(N)) O(N) stable (unstable for huge N) quick_sort O(N*log(N)) O(1) unstable aamsort O(N*log(N)) ? O(1) unstable symmsort O(N*(log(N))^2) O(1) stable yamsort Отношение размера динамически выделенной памяти к объему массива int [] K - количество malloc() блоков размером sqrt(N) для случайных int ключей array N array байт queue K queue (bytes)/array 1000 4000 4 0.188 5000 20000 6 0.103 10000 40000 7 0.081 50000 200000 13 0.062 100000 400000 17 0.056 250000 1000000 28 0.058 500000 2000000 38 0.055 1000000 4000000 55 0.056 2500000 10000000 84 0.054 5000000 20000000 121 0.054 10000000 40000000 169 0.054 25000000 100000000 264 0.053 Некоторые результаты измерений для массивов int [] и struct data { int key; double d; } [] VmHWMi максимальная резидентная память в KBytes для int []. timei время сортировки int [] в миллисекундах. Costi "стоимость" сортировки int [] в GBytes*sec (гигобайтосекунды). VmHWMd максимальная резидентная память в KBytes для struct data []. timed время сортировки struct data [] в миллисекундах. Costd "стоимость" сортировки struct data [] в GBytes*sec. DATA TABLES table1 [64-bits] 1000 (4000/16000 bytes max_key_value 2147483646) VmHWMi timei Costi VmHWMd timed Costd system qsort 724 0.143 0.104 748 0.104 0.078 libc _quicksort 676 0.081 0.055 684 0.092 0.063 template yamsort 688 0.057 0.039 700 0.062 0.043 template Swenson tim_sort 688 0.057 0.039 712 0.067 0.048 template mmsort 684 0.040 0.027 700 0.046 0.032 template symmsort 672 0.098 0.066 688 0.102 0.070 template Swenson quick_so 672 0.040 0.027 684 0.044 0.030 template aamsort 680 0.052 0.035 688 0.059 0.041 table2 [64-bits] 5000 (20000/80000 bytes max_key_value 2147483646) VmHWMi timei Costi VmHWMd timed Costd system qsort 752 0.420 0.316 876 0.638 0.559 libc _quicksort 692 0.488 0.338 748 0.561 0.420 template yamsort 700 0.360 0.252 768 0.407 0.313 template Swenson tim_sort 712 0.328 0.234 800 0.380 0.304 template mmsort 704 0.247 0.174 800 0.294 0.235 template symmsort 688 0.768 0.528 748 0.819 0.613 template Swenson quick_so 688 0.244 0.168 744 0.260 0.193 template aamsort 696 0.297 0.207 748 0.346 0.259 table3 [64-bits] 10000 (40000/160000 bytes max_key_value 2147483646) VmHWMi timei Costi VmHWMd timed Costd system qsort 792 0.900 0.713 1032 1.400 1.445 libc _quicksort 708 1.050 0.743 828 1.120 0.927 template yamsort 720 0.810 0.583 856 0.920 0.788 template Swenson tim_sort 736 0.710 0.523 908 0.830 0.754 template mmsort 732 0.530 0.388 916 0.660 0.605 template symmsort 704 1.720 1.211 832 1.870 1.556 template Swenson quick_so 704 0.530 0.373 828 0.570 0.472 template aamsort 712 0.660 0.470 828 0.750 0.621 table5 [64-bits] 100000 (400000/1600000 bytes max_key_value 2147483646) VmHWMi timei Costi VmHWMd timed Costd system qsort 1496 11.400 17.054 3840 21.900 84.096 libc _quicksort 1060 12.800 13.568 2236 13.500 30.186 template yamsort 1096 10.400 11.398 2348 11.800 27.706 template Swenson tim_sort 1236 9.000 11.124 2868 11.000 31.548 template mmsort 1264 7.100 8.974 3028 8.500 25.738 template symmsort 1060 22.700 24.062 2236 24.500 54.782 template Swenson quick_so 1060 6.500 6.890 2228 7.000 15.596 template aamsort 1068 7.500 8.010 2240 8.800 19.712 table7 [64-bits] 1000000 (4000000/16000000 bytes max_key_value 2147483646) VmHWMi timei Costi VmHWMd timed Costd system qsort 8532 132.400 1129.637 31968 211.800 6770.822 libc _quicksort 4572 149.200 682.142 16296 156.400 2548.694 template yamsort 4744 130.700 620.041 17132 144.500 2475.574 template Swenson tim_sort 6124 108.200 662.617 22316 131.400 2932.322 template mmsort 6540 86.600 566.364 24124 111.000 2677.764 template symmsort 4576 279.300 1278.077 16300 319.000 5199.700 template Swenson quick_so 4576 77.200 353.267 16296 85.500 1393.308 template aamsort 4584 88.400 405.226 16300 108.700 1771.810 table23 [64-bits] 10000000 (40000000/160000000 bytes max_key_value 2147483646) VmHWMi timei Costi VmHWMd timed Costd system qsort 78804 1550.000 122146.200 313184 2606.000 816157.504 libc _quicksort 39684 1707.000 67740.588 156880 1778.000 278932.640 template yamsort 41776 1499.000 62622.224 165200 1771.000 292569.200 template Swenson tim_sort 54792 1206.000 66079.152 216912 1552.000 336647.424 template mmsort 59200 1010.000 59792.000 234984 1320.000 310178.880 template symmsort 39688 4272.000 169547.136 156884 4556.000 714763.504 template Swenson quick_so 39692 893.000 35444.956 156884 1002.000 157197.768 template aamsort 39696 1031.000 40926.576 156884 1355.000 212577.820 Несколько странные результаты для 32-bit table1 [32-bits] 1000 (4000/12000 bytes max_key_value 2147483646) VmHWMi timei Costi VmHWMd timed Costd system qsort 620 0.117 0.073 636 0.136 0.086 template yamsort 556 0.062 0.034 556 0.084 0.047 template Swenson tim_sort 552 0.108 0.060 552 0.106 0.059 template mmsort 556 0.034 0.019 552 0.056 0.031 table2 [32-bits] 5000 (20000/60000 bytes max_key_value 2147483646) VmHWMi timei Costi VmHWMd timed Costd system qsort 640 0.679 0.435 672 0.907 0.610 template yamsort 552 0.451 0.249 552 0.514 0.284 template Swenson tim_sort 556 0.581 0.323 556 0.651 0.362 template mmsort 556 0.252 0.140 556 0.373 0.207 table5 [32-bits] 100000 (400000/1200000 bytes max_key_value 2147483646) VmHWMi timei Costi VmHWMd timed Costd system qsort 1272 19.800 25.186 2844 35.700 101.531 template yamsort 820 12.300 10.086 1872 16.800 31.450 template Swenson tim_sort 1120 17.200 19.264 2144 20.400 43.738 template mmsort 1080 8.400 9.072 2268 18.200 41.278 table7 [32-bits] 1000000 (4000000/12000000 bytes max_key_value 2147483646) VmHWMi timei Costi VmHWMd timed Costd system qsort 8220 254.400 2091.168 23952 413.100 9894.571 template yamsort 4700 169.200 795.240 12964 222.900 2889.676 template Swenson tim_sort 5876 215.100 1263.928 16732 283.400 4741.849 template mmsort 6360 115.800 736.488 18096 205.500 3718.728 table23 [32-bits] 10000000 (40000000/120000000 bytes max_key_value 2147483646) VmHWMi timei Costi VmHWMd timed Costd system qsort 78488 2952.000 231696.576 234752 4758.000 1116950.016 yamsort 41696 4899.000 204268.704 123840 5085.000 629726.400 timsort 54660 5305.000 289971.300 162772 6281.000 1022370.932 mmsort 58896 3501.000 206194.896 176112 4576.000 805888.512 symmsort 39624 8536.000 338230.464 117768 11543.000 1359396.024 libc _quicksort 39624 2604.000 103180.896 117508 2714.000 318916.712 template yamsort 41696 1978.000 82474.688 123840 2367.000 293129.280 template Swenson tim_sort 54656 1948.000 106469.888 162780 3110.000 506245.800 template mmsort 59164 1323.000 78273.972 176112 2307.000 406290.384 template symmsort 39624 5242.000 207709.008 117772 8721.000 1027089.612 template Swenson quick_so 39624 1518.000 60149.232 117768 1664.000 195965.952 template aamsort 39624 1131.000 44814.744 117768 2694.000 317266.992 

My brief summary: yamsort is of practical interest for sorting very large arrays in conditions of a lack of physical memory in the system.

The code is free. Maybe someone will be able to improve (change) it and get a practically important result.

Once again, I apologize for the volume, but I hope that this topic will be interesting to many. If someone can reasonably edit this disgrace, reducing it several times (but leaving it informative), I will only be grateful to him.

And of course, several questions arose in the course of development.

First question :

How to explain that on 32-bit yamsort is faster than timsort, and on 64-bit the other way around?

Personally, I will not attach my mind. I drove everything on Linux in virtuals. Running on Linux without a virtual machine, on a 64-bit machine of modules made on 32 and 64-bit virtual machines showed the same thing. Those. the numbers are different (the CPU is different there) and the nature of the data has not changed.

Second :

I was told that the algorithm might be interesting in the field of the embedded system. What do you think about that ?

UPD 1

Compare long long [] template C timsort with yamsort (as requested by @Dex)

  1000 5000 10000 30000 100000 3000000 1000000 5000000 10000000 ll_tim_sort 0.062 0.334 0.713 2.508 8.983 28.76 110.37 592.6 1241 kB 704 744 800 1016 1792 3980 11528 54800 108840 Cost 0.043 0.248 0.571 2.548 16.09 114.5 1272 32474 135136 ll_yamsort 0.058 0.365 0.811 2.666 10.375 35.45 130.35 745.7 1560 kB 692 732 768 940 1524 3156 8884 41792 82936 Cost 0.040 0.267 0.623 2.506 15.78 111.9 1158 31164 129422 

You may notice that the yamsort / timsort runtime ratio grows from 0.93 to 1.25. This seems to indicate that the yamsort time is slightly worse than N * log N. Probably, the copying of the queue blocks placed in the array memory is having an effect. However, by the cost criterion, the память \* время (this would be if we paid for each megabyte of RSS occupied for a second) yamsort is more economical for long arrays.