I understand that we add an additional element to the array (barrier), and no longer have to think about the boundaries of the array. Only one thing is not clear: the number we are looking for, we ourselves enter and replace the penultimate element - how can this be called a search? We already know where he is. And you can write a complete version of a workable program with the implementation of this barrier search?

2 answers 2

The essence of such a search is that we get rid of one condition - the checking of boundaries. If we take a linear search in an array of numbers, then we get a performance boost. Compare two options:

//1 size_t find(int const *arr, size_t size, int value) { for (size_t i = 0; i < size; ++i) {//Первое условие в цикле if (arr[i] == value) {//Второе для проверки самого значения return i; } } return std::numeric_limits<size_t>::max(); } //2 size_t find(int *arr, size_t size, int value) { if (size != 0) { int last = arr[size - 1];//Сохраним прежний элемент массива arr[size - 1] = value;//Гарантируем, что value есть в массиве //Есть гарантия того, что элемент есть в массиве, значит индекс можно не проверять size_t i = 0; for (i = 0; arr[i] != value; ++i) {//Одно условие в цикле } arr[size - 1] = last;//Восстанавливаем последний элемент if (i != (size - 1) || value == last) {//Не уткнулись в барьер или последний элемент был искомым return i; } } return std::numeric_limits<size_t>::max(); } 

Once I made measurements of a similar implementation. Because we got rid of one condition, then the loop runs about twice as fast. At the beginning and at the end we have additional actions, but they are performed once, and not at each iteration of the cycle. Also, the implementation is made in such a way that no additional element is required, but at the same time it is necessary to refuse constancy. I hope that I understood the essence of the question correctly.

  • typo: const int * , not int const * ... ... the second is a constant pointer ... - Fat-Zer
  • 2
    @ Fat-Zer You are mistaken, the entries int const * and const int * equivalent and these will be pointers to constants. And the constant pointer is int * const . Special attention to the position of const and "asterisks". - Croessmah
  • Yes, I apologize ... my wrong ... see, the new year is still not over ... - Fat-Zer
  • @ Fat-Zer new year has just begun! - αλεχολυτ
  • I would check if the last one was sought, right away :) - Harry

I measured the performance of the @ Croessmah code on the hardware. As a result:

  • For arrays of the order of L1, the speed increase is about 97% ± 5%
  • For arrays of order L2
    • uint64_t gives ~ 85% growth
    • uint32_t still gives ~ 95% increase
  • For arrays that fit in L3:
    • uint64_t - 70%
    • uint32_t - 91 ~ 97%
  • For arrays far superior to L3:
    • uint64_t - barrier search gives only 10% increase
    • uint32_t - gives a 50% increase.

Code / I apologize for the mixture of pluses and C /
Compiled by x86_64-pc-linux-gnu-g++-6.4.0 -O3 -fno-inline -march=native iron: Core i5-3570K):

 #include <typeinfo> #include <limits> #include <stdlib.h> #include <stdio.h> #include <stdint.h> #include <time.h> #include <assert.h> #define NUM_TRIES 128 #define SECTION_SZ 8 #define MIN_COUNTABLE_TIME ( 50 * 1000.0 / CLOCKS_PER_SEC) template <typename T> ssize_t find(T const *arr, size_t size, T value) { for (size_t i = 0; i < size; ++i) {//Первое условие в цикле if (arr[i] == value) {//Второе для проверки самого значения return i; } } return -1; } template <typename T> ssize_t barier(T *arr, size_t size, T value) { if (size != 0) { T last = arr[size - 1];//Сохраним прежний элемент массива arr[size - 1] = value;//Гарантируем, что value есть в массиве //Есть гарантия того, что элемент есть в массиве, значит индекс можно не проверять size_t i = 0; for (i = 0; arr[i] != value; ++i) {//Одно условие в цикле } arr[size - 1] = last;//Восстанавливаем последний элемент if (i != (size - 1) || value == last) {//Не уткнулись в барьер или последний элемент был искомым return i; } } return -1; } template <typename T> void mesure_time (size_t num) { T *arr = static_cast<T*> (malloc (num * sizeof(T))); assert (arr); if (std::numeric_limits<T>::max() > num) { for (size_t i=0; i<num; i++) { arr[i] = i; } } else { for (size_t i=0; i<num; i++) { arr[i] = i*(std::numeric_limits<T>::max() - 1) / num; } } printf("number of elements = %ld\n", num); printf("sizeof(T) = %zd\n", sizeof(T)); if (num*sizeof(T) < 16L*1024L*1024L) { printf("size of array = %zdk\n", sizeof(T) * num / 1024); } else { printf("size of array = %zdM\n", sizeof(T) * num / 1024 / 1024); } printf (" N | find(ms) | barier(ms) | diff | surplus\n"); double total_avg_find = 0; double total_avg_barier = 0; int valid = 0; T search_key = std::numeric_limits<T>::max(); for (size_t i=num; i>32; i/=2, search_key = arr[i]) { double avg_find; double avg_barier; clock_t start_time = clock(); ssize_t found_find = find (arr, num, search_key); double wt_find = (clock() - start_time) * (1000. / CLOCKS_PER_SEC); start_time = clock(); ssize_t found_barier = barier(arr, num, search_key); double wt_barier = (clock() - start_time) * (1000. / CLOCKS_PER_SEC); assert (found_find == found_barier); if ( wt_find > MIN_COUNTABLE_TIME && wt_barier > MIN_COUNTABLE_TIME ) { // get a per-iteration count for (int j = 1; j<NUM_TRIES; j++) { start_time = clock(); found_barier = barier(arr, num, search_key); wt_barier += (clock() - start_time) * (1000. / CLOCKS_PER_SEC); start_time = clock(); found_find = find(arr, num, search_key); wt_find += (clock() - start_time) * (1000. / CLOCKS_PER_SEC); assert (found_find == found_barier); } } else { wt_find = 0; wt_barier = 0; ssize_t found_find[NUM_TRIES]; ssize_t found_barier[NUM_TRIES]; bool to_small_failure = false; for (int j = 1; j<NUM_TRIES; j++) { // get overall count start_time = clock(); for (int k = 0; k<NUM_TRIES; k++) { found_find[k] = find(arr, num, search_key); } wt_find += (clock() - start_time) * (1000. / CLOCKS_PER_SEC); start_time = clock(); for (int k = 0; k<NUM_TRIES; k++) { found_barier[k] = barier(arr, num, search_key); } wt_barier += (clock() - start_time) * (1000. / CLOCKS_PER_SEC); for (int k = 0; k<NUM_TRIES; k++) { assert(found_find[k] == found_barier[k]); } if ( wt_find < MIN_COUNTABLE_TIME || wt_barier < MIN_COUNTABLE_TIME ) { to_small_failure = true; break; } } if (to_small_failure) { // invalid; some results, are too small to mesure continue; } wt_find /= NUM_TRIES; wt_barier /= NUM_TRIES; } valid ++; avg_find = wt_find / NUM_TRIES; avg_barier = wt_barier / NUM_TRIES; double diff = avg_find - avg_barier; double surplus = diff * 100 / avg_barier; printf ("%10zd | %8.3lgms | %8.3lgms | %10.3lgms | %+7.2lf%%\n", found_find, avg_find, avg_barier, diff, surplus); total_avg_find += avg_find; total_avg_barier += avg_barier; } total_avg_find /= valid; total_avg_barier /= valid; double diff = total_avg_find - total_avg_barier; double surplus = diff * 100 / total_avg_barier; printf ("\n"); printf ("Avg time: %8.3lgms | %8.3lgms | %10.3lgms | %+7.2lf%%\n", total_avg_find, total_avg_barier, diff, surplus); free (arr); } int main() { // L1-fit array (32K-4K) mesure_time<uint8_t> ( 1024L*28 ); printf ("--------------------------------------------------------------\n"); mesure_time<uint32_t> ( 1024L*28 / 4 ); printf ("--------------------------------------------------------------\n"); mesure_time<uint64_t> ( 1024L*28 / 8 ); printf ("==============================================================\n"); // L2-fit array (256K-8K) mesure_time<uint32_t> ( 1024L*248 / 4 ); printf ("--------------------------------------------------------------\n"); mesure_time<uint64_t> ( 1024L*248 / 8 ); printf ("==============================================================\n"); // L2-excess array 1M mesure_time<uint32_t> ( 1024L*1024L / 4 ); printf ("--------------------------------------------------------------\n"); mesure_time<uint64_t> ( 1024L*1024L / 8 ); printf ("==============================================================\n"); // L3-fit array 6M-32K mesure_time<uint32_t> ( (1024L*1024L*6 - 1024L*32) / 4); printf ("==============================================================\n"); mesure_time<uint64_t> ( (1024L*1024L*6 - 1024L*32) / 8); printf ("==============================================================\n"); // L3-excess array (16M) mesure_time<uint8_t> ( 1024L*1024L*16 ); printf ("--------------------------------------------------------------\n"); mesure_time<uint32_t> ( 1024L*1024L*16 / 4); printf ("--------------------------------------------------------------\n"); mesure_time<uint64_t> ( 1024L*1024L*16 / 8); printf ("==============================================================\n"); // huge array (512M) mesure_time<uint32_t> ( 1024L*1024L*512 / 4); printf ("--------------------------------------------------------------\n"); mesure_time<uint64_t> ( 1024L*1024L*512 / 8); printf ("==============================================================\n"); return 0; } 


Results:

 number of elements = 28672 sizeof(T) = 1 size of array = 28k N | find(ms) | barier(ms) | diff | surplus -1 | 0.0153ms | 0.00764ms | 0.00771ms | +101.00% 14336 | 0.00757ms | 0.00379ms | 0.00377ms | +99.47% 7112 | 0.00375ms | 0.0019ms | 0.00185ms | +97.52% 3500 | 0.00186ms | 0.000948ms | 0.000909ms | +95.91% 1694 | 0.000901ms | 0.000464ms | 0.000438ms | +94.37% Avg time: 0.00588ms | 0.00295ms | 0.00294ms | +99.62% -------------------------------------------------------------- number of elements = 7168 sizeof(T) = 4 size of array = 28k N | find(ms) | barier(ms) | diff | surplus -1 | 0.00378ms | 0.0019ms | 0.00188ms | +98.55% 3584 | 0.0019ms | 0.000964ms | 0.000934ms | +96.90% 1792 | 0.000952ms | 0.000491ms | 0.000461ms | +93.85% Avg time: 0.00221ms | 0.00112ms | 0.00109ms | +97.39% -------------------------------------------------------------- number of elements = 3584 sizeof(T) = 8 size of array = 28k N | find(ms) | barier(ms) | diff | surplus -1 | 0.0019ms | 0.000966ms | 0.000935ms | +96.72% 1792 | 0.000956ms | 0.00049ms | 0.000466ms | +95.24% Avg time: 0.00143ms | 0.000728ms | 0.000701ms | +96.22% ============================================================== number of elements = 63488 sizeof(T) = 4 size of array = 248k N | find(ms) | barier(ms) | diff | surplus -1 | 0.0334ms | 0.0168ms | 0.0166ms | +98.47% 31744 | 0.0168ms | 0.00843ms | 0.00837ms | +99.36% 15872 | 0.00839ms | 0.00421ms | 0.00417ms | +99.05% 7936 | 0.0042ms | 0.00212ms | 0.00207ms | +97.66% 3968 | 0.0021ms | 0.00106ms | 0.00104ms | +97.36% 1984 | 0.00105ms | 0.000542ms | 0.000509ms | +94.02% Avg time: 0.011ms | 0.00554ms | 0.00546ms | +98.61% -------------------------------------------------------------- number of elements = 31744 sizeof(T) = 8 size of array = 248k N | find(ms) | barier(ms) | diff | surplus -1 | 0.0168ms | 0.00948ms | 0.00734ms | +77.50% 15872 | 0.00845ms | 0.00459ms | 0.00386ms | +84.07% 7936 | 0.00419ms | 0.00227ms | 0.00192ms | +84.65% 3968 | 0.00213ms | 0.00108ms | 0.00104ms | +96.40% 1984 | 0.00106ms | 0.000541ms | 0.000516ms | +95.34% Avg time: 0.00653ms | 0.00359ms | 0.00294ms | +81.76% ============================================================== number of elements = 262144 sizeof(T) = 4 size of array = 1024k N | find(ms) | barier(ms) | diff | surplus -1 | 0.141ms | 0.0714ms | 0.0694ms | +97.16% 131072 | 0.0692ms | 0.0352ms | 0.034ms | +96.72% 65536 | 0.0347ms | 0.0174ms | 0.0173ms | +99.14% 32768 | 0.0173ms | 0.00869ms | 0.00857ms | +98.56% 16384 | 0.00863ms | 0.00433ms | 0.0043ms | +99.24% 8192 | 0.00432ms | 0.00219ms | 0.00213ms | +97.00% 4096 | 0.00216ms | 0.0011ms | 0.00107ms | +97.57% 2048 | 0.00108ms | 0.000556ms | 0.000529ms | +95.03% Avg time: 0.0348ms | 0.0176ms | 0.0172ms | +97.44% -------------------------------------------------------------- number of elements = 131072 sizeof(T) = 8 size of array = 1024k N | find(ms) | barier(ms) | diff | surplus -1 | 0.0692ms | 0.0405ms | 0.0286ms | +70.56% 65536 | 0.0346ms | 0.0203ms | 0.0143ms | +70.65% 32768 | 0.0173ms | 0.00976ms | 0.00753ms | +77.19% 16384 | 0.00865ms | 0.00469ms | 0.00396ms | +84.49% 8192 | 0.00437ms | 0.00235ms | 0.00202ms | +85.73% 4096 | 0.00218ms | 0.00111ms | 0.00107ms | +96.06% 2048 | 0.00108ms | 0.000558ms | 0.000527ms | +94.55% Avg time: 0.0196ms | 0.0113ms | 0.00829ms | +73.20% ============================================================== number of elements = 1564672 sizeof(T) = 4 size of array = 6112k N | find(ms) | barier(ms) | diff | surplus -1 | 0.855ms | 0.506ms | 0.348ms | +68.82% 782336 | 0.418ms | 0.219ms | 0.199ms | +91.23% 391168 | 0.21ms | 0.107ms | 0.103ms | +96.36% 195584 | 0.105ms | 0.0534ms | 0.0515ms | +96.50% 97792 | 0.0516ms | 0.0263ms | 0.0254ms | +96.50% 48896 | 0.0259ms | 0.013ms | 0.0129ms | +99.62% 24448 | 0.0129ms | 0.00645ms | 0.00642ms | +99.53% 12224 | 0.0065ms | 0.00326ms | 0.00324ms | +99.25% 6112 | 0.00323ms | 0.00163ms | 0.0016ms | +98.46% 3056 | 0.00163ms | 0.000827ms | 0.000802ms | +96.96% 1528 | 0.000825ms | 0.000429ms | 0.000396ms | +92.40% Avg time: 0.154ms | 0.0852ms | 0.0685ms | +80.37% ============================================================== number of elements = 782336 sizeof(T) = 8 size of array = 6112k N | find(ms) | barier(ms) | diff | surplus -1 | 0.456ms | 0.339ms | 0.118ms | +34.71% 391168 | 0.212ms | 0.132ms | 0.0806ms | +61.26% 195584 | 0.105ms | 0.0621ms | 0.0433ms | +69.73% 97792 | 0.0518ms | 0.0303ms | 0.0215ms | +70.92% 48896 | 0.0259ms | 0.0151ms | 0.0108ms | +71.66% 24448 | 0.013ms | 0.00728ms | 0.0057ms | +78.25% 12224 | 0.0065ms | 0.00353ms | 0.00297ms | +84.06% 6112 | 0.00327ms | 0.00177ms | 0.0015ms | +84.76% 3056 | 0.00163ms | 0.000828ms | 0.000801ms | +96.71% 1528 | 0.000813ms | 0.000422ms | 0.000391ms | +92.61% Avg time: 0.0877ms | 0.0591ms | 0.0285ms | +48.20% ============================================================== number of elements = 16777216 sizeof(T) = 1 size of array = 16M N | find(ms) | barier(ms) | diff | surplus -1 | 9.17ms | 4.91ms | 4.26ms | +86.67% 8388608 | 4.52ms | 2.42ms | 2.1ms | +87.04% 4161278 | 2.23ms | 1.15ms | 1.09ms | +94.93% 2047613 | 1.13ms | 0.571ms | 0.561ms | +98.29% 990781 | 0.545ms | 0.273ms | 0.271ms | +99.25% 462365 | 0.249ms | 0.126ms | 0.123ms | +98.20% 198157 | 0.107ms | 0.0537ms | 0.053ms | +98.62% 66053 | 0.036ms | 0.0181ms | 0.0179ms | +98.98% Avg time: 2.25ms | 1.19ms | 1.06ms | +89.06% -------------------------------------------------------------- number of elements = 4194304 sizeof(T) = 4 size of array = 16M N | find(ms) | barier(ms) | diff | surplus -1 | 2.32ms | 1.52ms | 0.802ms | +52.91% 2097152 | 1.15ms | 0.717ms | 0.436ms | +60.85% 1048576 | 0.581ms | 0.323ms | 0.258ms | +80.11% 524288 | 0.285ms | 0.147ms | 0.138ms | +94.08% 262144 | 0.144ms | 0.0732ms | 0.0703ms | +96.06% 131072 | 0.0708ms | 0.036ms | 0.0348ms | +96.56% 65536 | 0.0352ms | 0.0177ms | 0.0175ms | +98.37% 32768 | 0.0181ms | 0.00907ms | 0.00899ms | +99.15% 16384 | 0.00899ms | 0.00453ms | 0.00446ms | +98.59% 8192 | 0.0045ms | 0.00228ms | 0.00222ms | +97.50% 4096 | 0.00228ms | 0.00115ms | 0.00113ms | +98.49% 2048 | 0.00114ms | 0.000582ms | 0.000558ms | +95.79% Avg time: 0.385ms | 0.237ms | 0.148ms | +62.34% -------------------------------------------------------------- number of elements = 2097152 sizeof(T) = 8 size of array = 16M N | find(ms) | barier(ms) | diff | surplus -1 | 1.28ms | 1.15ms | 0.133ms | +11.62% 1048576 | 0.619ms | 0.497ms | 0.122ms | +24.52% 524288 | 0.29ms | 0.19ms | 0.0999ms | +52.51% 262144 | 0.141ms | 0.0836ms | 0.0572ms | +68.35% 131072 | 0.0712ms | 0.0417ms | 0.0295ms | +70.80% 65536 | 0.0348ms | 0.0204ms | 0.0144ms | +70.36% 32768 | 0.0173ms | 0.00983ms | 0.0075ms | +76.28% 16384 | 0.00877ms | 0.00477ms | 0.004ms | +83.77% 8192 | 0.0044ms | 0.00238ms | 0.00201ms | +84.40% 4096 | 0.0022ms | 0.00112ms | 0.00107ms | +95.35% 2048 | 0.00115ms | 0.00059ms | 0.000555ms | +94.06% Avg time: 0.224ms | 0.182ms | 0.0428ms | +23.59% ============================================================== number of elements = 134217728 sizeof(T) = 4 size of array = 512M N | find(ms) | barier(ms) | diff | surplus -1 | 72.9ms | 48.1ms | 24.8ms | +51.61% 67108864 | 36.6ms | 24.2ms | 12.4ms | +51.43% 33554432 | 18.2ms | 12.1ms | 6.14ms | +50.88% 16777216 | 9.06ms | 5.98ms | 3.08ms | +51.45% 8388608 | 4.53ms | 2.98ms | 1.55ms | +52.18% 4194304 | 2.27ms | 1.46ms | 0.809ms | +55.45% 2097152 | 1.14ms | 0.704ms | 0.436ms | +61.95% 1048576 | 0.562ms | 0.305ms | 0.257ms | +84.43% 524288 | 0.278ms | 0.142ms | 0.136ms | +95.28% 262144 | 0.139ms | 0.0707ms | 0.0688ms | +97.28% 131072 | 0.0704ms | 0.0359ms | 0.0345ms | +96.25% 65536 | 0.035ms | 0.0176ms | 0.0173ms | +98.45% 32768 | 0.018ms | 0.00904ms | 0.00896ms | +99.17% 16384 | 0.00883ms | 0.00444ms | 0.00438ms | +98.64% 8192 | 0.00432ms | 0.00219ms | 0.00214ms | +97.83% 4096 | 0.00226ms | 0.00114ms | 0.00112ms | +97.77% 2048 | 0.00113ms | 0.00058ms | 0.00055ms | +94.82% Avg time: 8.58ms | 5.65ms | 2.93ms | +51.85% -------------------------------------------------------------- number of elements = 67108864 sizeof(T) = 8 size of array = 512M N | find(ms) | barier(ms) | diff | surplus -1 | 39.6ms | 35.1ms | 4.48ms | +12.76% 33554432 | 20ms | 17.9ms | 2.08ms | +11.64% 16777216 | 9.93ms | 8.89ms | 1.04ms | +11.66% 8388608 | 4.98ms | 4.47ms | 0.51ms | +11.40% 4194304 | 2.51ms | 2.27ms | 0.235ms | +10.35% 2097152 | 1.24ms | 1.14ms | 0.105ms | +9.24% 1048576 | 0.612ms | 0.488ms | 0.124ms | +25.41% 524288 | 0.296ms | 0.202ms | 0.094ms | +46.55% 262144 | 0.146ms | 0.0872ms | 0.0587ms | +67.31% 131072 | 0.0703ms | 0.0412ms | 0.0292ms | +70.92% 65536 | 0.0355ms | 0.0207ms | 0.0147ms | +71.01% 32768 | 0.0176ms | 0.00964ms | 0.008ms | +83.06% 16384 | 0.00903ms | 0.00486ms | 0.00416ms | +85.62% 8192 | 0.0045ms | 0.00244ms | 0.00206ms | +84.45% 4096 | 0.00226ms | 0.00115ms | 0.00111ms | +96.03% 2048 | 0.00114ms | 0.000594ms | 0.00055ms | +92.70% Avg time: 4.96ms | 4.42ms | 0.549ms | +12.44% ============================================================== 
  • N - index of the item found
  • find(ms) - the average time for NUM_TRIES passes in milliseconds to find it using find()
  • barier(ms) - similarly, with the help of barrier search.
  • diff is the difference of the previous two.
  • surplus - percentage increase in productivity, the ratio of diff to barier.
  • Serious approach. And did you really have, that after continue on to_small_failure = true some of the following measurements (with the same array) fell into the result? - avp
  • @avp, no. It is possible to replace with break in the final version. Initially, the cycle was in direct order, with an increase in i but firstly it seemed that the results were smeared on the cache size boundary, and secondly, the reverse conclusion seems more visual ... therefore, I unrolled the cycle, but continue remained ... generally speaking, tests here are very redundant. The last two tests actually allow to draw all the conclusions. - Fat-Zer 6:51 pm