There is an algorithm for quickly squaring arbitrary numbers in the base number system b (not necessarily b = 2 or b = 2^k ). This algorithm is further used in the exponentiation by the method of successive squaring and, ideally, should work even with 2048-bit numbers. I cannot implement it, I do not know how to store these numbers, how to extract numbers from them, how to multiply such numbers with the help of shifts, how to calculate pairs and triples (u, v) and (c, u, v).

The use of the transfer hints at ease of implementation in assembler, but it is completely incomprehensible how to organize the data in this algorithm, and how to implement paragraph 2.3.

Attempt to implement:

 #include <stdio.h> // Вход: массив цифр числа x по основанию b // Выход: массив y длины 2 * xlen, содержащий цифры x^2 // b - основание системы счисления, xlen - длина массива x void fast_square(char x[], char y[], int b, int xlen) { int i, j, c, u, v; int uv, cuv; for(i = 0; i <= 2*xlen - 1; i++) y[i] = 0; for(i = 0; i <= n - 1; i++) { uv = y[2 * i] + x[i] * x[i]; y[2 * i] = uv % d; // uv % b - последняя цифра числа (u, v) u = uv / b; // Из числа uv удалили цифру v и сохранили u for(j = i + 1; j <= xlen - 1; j++) { // Здесь (c*b + u) равно числу (c, u) в СС с осн. b cuv = y[i + j] + 2 * x[i] * x[j] + (c*b + u); y[i + j] = cuv % b; // Последняя цифра v числа cuv } } } int main(){ } 

Algorithm: link Another link to rsdn, where they tried to write an implementation, but did not write

enter image description here

  • No need to provide text with screenshots - Abyx
  • Since you are trying to achieve better performance, I can advise using unsigned int instead of char - the unit of calculation is a machine word, not just one byte, and problems can arise when using the sign type. For storage use an array. For arithmetic operations, imagine how this machine calculates everything, you have to implement it through "multiplication (and not only) in a column", follow the hyphenation yourself and others. - Yuriy Orlov

1 answer 1

UPD 10/26/2017

Algorithm output

It is known that:

  • The square of the sum of several numbers is the sum of their squares, plus the double sum of all their pair-wise works.
  • The bit square (0 | 1) is equal to this bit

    Let n = b 0 2 0 + b 1 2 1 + b 2 2 2 + ... + b i-1 2 i-1 ,
    then n 2 = (b 0 2 0 + b 1 2 1 + b 2 2 2 + ... + b i-1 2 i-1 ) 2 , where i is the digit capacity of the number n .

    n 2 = b 0 2 0 + b 1 2 2 + b 2 2 4 + ... + b i-1 2 2i-2 + b 0 4 1 (b 1 2 0 + b 2 2 1 + b 3 2 2 + ... + b i-1 2 i-2 ) + b 1 4 2 (b 2 2 0 + b 3 2 1 + b 4 2 2 + ... + b i 1 1 i-3 ) + b 2 4 3 (b 3 2 0 + b 4 2 1 + b 5 2 2 + ... + b i-1 2 i-4 ) + ... + b i-2 4 i-1 b i-1 ,

    n 2 = b 0 ((1 << 0) | (n >> 1) << 2) + b 1 ((1 << 1) | (n >> 2) << 4) + b 2 ((1 << 2) | (n >> 3) << 6) + ...
    + b i-2 ((1 << (i-2) | (n >> (i-1)) << (2i-2)) + b i-1 ((1 << (i-1) | (n >> i) << (2i).

    Let j k = ((2n) >> k) , then b k = ((j k + 1 & 1)> 0) = ((j k & 2> 0) ,

    n 2 = (j 0 & 2? (1 << 0) | (j 2 << 2): 0) + (j 1 & 2? (1 << 1) | (j 3 << 4): 0) + (j 2 & 1? (1 << 2) | (j 4 << 6): 0) + ... + (j i-2 & 2? (1 << (i-2)) | (j i << (2i-2): 0) + (j i-1 & 2? (1 << (i-1) | (j i + 1 << (2i): 0) .

    However, j k + 2 << 2 = (j k >> 2) << 2 = j k & 0b111 ... 100 = j k & (-4) (reset the zero and first bits), therefore:

    n 2 = (j 0 & 2? (j 0 & -4 | 1) << 0: 0) + (j 1 & 2? (j 1 & -4 | 1) << 2: 0) + (j 2 & 1? (J 2 & -4 | 1) << 4: 0) + ... + (j i-2 & 2? (J i-2 & -4 | 1) << (2i-4): 0) + (j i-1 & 2? (J i-1 & -4 | 1) << (2i -2): 0) .

    Software implementation for integer format numbers

    In the program (in PHP), the pyramidal squaring algorithm is implemented as a function pow2() and tested for integer numbers. The pow2($i) function pow2($i) 2.5 times in time to the standard pow($i, 2) procedure using hardware multiplication, which can be considered a good result. In the 64-bit version of PHP7, the loss is greater (~ 10 times).

    Program text:

     define("N", 99); // Количество точек в тестовом массиве. define("PR", 1); // PR=0 - тест производительности. // Вывод одномерного массива function print_1d($text, $v){ $eps = 1e-7; print "$text"."["; $cnt = -1; foreach($v as $key => $item){ if(++$cnt) print ",&emsp;"; $flag = true; if($cnt && !($cnt%5)) print"<br>"; print "&quot;$key&quot;=>"; if(abs((int)$item - $item) > $eps){ printf("%.3f", $item); }else{ print $item; } } print "], "; } // Пирамидальное возведение в квадрат function pow2($n){ if($n < 0) $n = -$n; $res = 0; for($j = $n << 1, $sh = 0; $j > 1; $j >>= 1, $sh += 2){ $res += ($j & 2) ? ($j & -4 | 1) << $sh : 0; } return $res; } $test = []; for($i = 0; $i < N; $i++){ $test[] = mt_rand(-9999, 9999); } $time_start = microtime(true); $pow2_ = []; for($i = 0; $i < N; $i++){ $pow2_[$test[$i]] = pow2($test[$i]); } $time1 = microtime(true) - $time_start; $time_start = microtime(true); $pow2__ = []; for($i = 0; $i < N; $i++){ $pow2__[$test[$i]] = pow($test[$i], 2); } $time2 = microtime(true) - $time_start; printf("Возведение в квадрат для N = %d случайных точек", (int)N); if(PR > 0){ print_1d("<br>*** Пирамидальный алгоритм ***<br>Результаты:<br>", $pow2_); print_1d("<br>*** Библиотечный алгоритм ***<br>Результаты:<br>", $pow2__); }else{ print("<br>*** Пирамидальный алгоритм ***<br>Время выполнения: $time1"); print("<br>*** Библиотечный алгоритм ***<br>Время выполнения: $time2"); } 

    Check on a random sample:

     Squaring for N = 99 random points
     *** Pyramidal algorithm ***
     Results:
     ["8154" => 66487716, "993" => 986049, "-3669" => 13461561, "7855" => 61701025, "-7597" => 57714409, 
     "1568" => 2458624, "-9029" => 81522841, "-6646" => 44169316, "-4934" => 24344356, "-3954" => 15634116, 
     "-5303" => 28121809, "70" => 4900, "1661" => 2758921, "-3628" => 13162384, "5734" => 32878756, 
     "4802" => 23059204, "-1886" => 3556996, "4269" => 18224361, "9638" => 92891044, "-7879" => 62078641, 
     "9218" => 84971524, "-2221" => 4932841, "-7153" => 51165409, "5861" => 34351321, "-3656" => 13366336, 
     "3058" => 9351364, "-8879" => 78836641, "593" => 351649, "329" => 108241, "-4801" => 23049601, 
     "-8637" => 74597769, "4480" => 20070400, "9918" => 98366724, "8584" => 73685056, "-8525" => 72675625, 
     "1920" => 3686400, "5019" => 25190361, "8697" => 75637809, "-5594" => 31292836, "-8198" => 67207204, 
     "-7119" => 50680161, "-6288" => 39538944, "131" => 17161, "320" => 102400, "-5410" => 29268100, 
     "9147" => 83667609, "-7411" => 54922921, "-2211" => 4888521, "3990" => 15920100, "-2077" => 4313929, 
     "-6588" => 43401744, "-9280" => 86118400, "7357" => 54125449, "-1909" => 3644281, "6109" => 37319881, 
     "911" => 829921, "8898" => 79174404, "-4262" => 18164644, "6537" => 42732369, "-3567" => 12723489, 
     "1345" => 1809025, "-9229" => 85174441, "6654" => 44275716, "-9949" => 98982601, "9057" => 82029249, 
     "-8760" => 76737600, "779" => 606841, "-9905" => 98109025, "45" => 2025, "-8160" => 66585600, 
     "-6796" => 46185616, "-5270" => 27772900, "8019" => 64304361, "-1922" => 3694084, "1542" => 2377764, 
     "-2489" => 6195121, "-4835" => 23377225, "4752" => 22581504, "-9621" => 92563641, "782" => 611524, 
     "6617" => 43784689, "4506" => 20304036, "-7679" => 58967041, "-1476" => 2178576, "-6194" => 38365636, 
     "-4385" => 19228225, "7394" => 54671236, "1323" => 1750329, "870" => 756900, "-6723" => 45198729, 
     "4162" => 17322244, "-5436" => 29550096, "-7761" => 60233121, "-596" => 355216, "-7463" => 55696369, 
     "356" => 126736, "429" => 184041, "-1011" => 1022121, "9811" => 96255721], 
     *** Library algorithm ***
     Results:
     ["8154" => 66487716, "993" => 986049, "-3669" => 13461561, "7855" => 61701025, "-7597" => 57714409, 
     "1568" => 2458624, "-9029" => 81522841, "-6646" => 44169316, "-4934" => 24344356, "-3954" => 15634116, 
     "-5303" => 28121809, "70" => 4900, "1661" => 2758921, "-3628" => 13162384, "5734" => 32878756, 
     "4802" => 23059204, "-1886" => 3556996, "4269" => 18224361, "9638" => 92891044, "-7879" => 62078641, 
     "9218" => 84971524, "-2221" => 4932841, "-7153" => 51165409, "5861" => 34351321, "-3656" => 13366336, 
     "3058" => 9351364, "-8879" => 78836641, "593" => 351649, "329" => 108241, "-4801" => 23049601, 
     "-8637" => 74597769, "4480" => 20070400, "9918" => 98366724, "8584" => 73685056, "-8525" => 72675625, 
     "1920" => 3686400, "5019" => 25190361, "8697" => 75637809, "-5594" => 31292836, "-8198" => 67207204, 
     "-7119" => 50680161, "-6288" => 39538944, "131" => 17161, "320" => 102400, "-5410" => 29268100, 
     "9147" => 83667609, "-7411" => 54922921, "-2211" => 4888521, "3990" => 15920100, "-2077" => 4313929, 
     "-6588" => 43401744, "-9280" => 86118400, "7357" => 54125449, "-1909" => 3644281, "6109" => 37319881, 
     "911" => 829921, "8898" => 79174404, "-4262" => 18164644, "6537" => 42732369, "-3567" => 12723489, 
     "1345" => 1809025, "-9229" => 85174441, "6654" => 44275716, "-9949" => 98982601, "9057" => 82029249, 
     "-8760" => 76737600, "779" => 606841, "-9905" => 98109025, "45" => 2025, "-8160" => 66585600, 
     "-6796" => 46185616, "-5270" => 27772900, "8019" => 64304361, "-1922" => 3694084, "1542" => 2377764, 
     "-2489" => 6195121, "-4835" => 23377225, "4752" => 22581504, "-9621" => 92563641, "782" => 611524, 
     "6617" => 43784689, "4506" => 20304036, "-7679" => 58967041, "-1476" => 2178576, "-6194" => 38365636, 
     "-4385" => 19228225, "7394" => 54671236, "1323" => 1750329, "870" => 756900, "-6723" => 45198729, 
     "4162" => 17322244, "-5436" => 29550096, "-7761" => 60233121, "-596" => 355216, "-7463" => 55696369, 
     "356" => 126736, "429" => 184041, "-1011" => 1022121, "9811" => 96255721],
    

    Performance Testing (PHP5):

     Squaring for N = 100,000 random points
     *** Pyramidal algorithm ***
     Execution time: 1.29107379913
     *** Library algorithm ***
     Execution time: 0.49302816391 

    Recommendations for the algorithm of large numbers

    Working with large numbers is limited to the following small fragment:

      for($j = 2*$n, $sh = 0; $j > 1; $j >>= 1, $sh += 2){ if($j & 2){ $res += ($j ^ 3 | 1) << $sh; } } 

    Recommendations for large numbers:

    1. The number format is binary. When using an array of words, problems will arise with the sign bits and the storage of the actual length of the number. More flexible is the character format with encoding and decoding through the functions chr() and ord() , since it has access to each byte.
    2. When comparing $j > 1 , the length of the numbers in bits is sufficient. When testing the $j & $2 bit and forming the zero and first bits, it suffices to use only the lower part of the number.
    3. The shift and sum operations for $n must be implemented in full.