UPD
One of the best prognostic methods is autoregressive. In any case, he will catch periodic bursts.
Signal model
The method is based on an autoregression model (AP) of order k
for the sample i = 0, 1, ..., n-1
:
x i + f 0 x i-1 + f 1 x i-2 + f 2 x i-3 + ... + f k-1 x ik = 0, where i = k, k + 1, ..., n-1.
The order of the model should roughly correspond to the complexity of the signal.
Toeplitz symmetry
According to the least squares method, the residual should be minimized:
EPS = <(x i + f 0 x i-1 + f 1 x i-2 + f 2 x i-3 + ... + f k x ik ) 2 > i = k..n-1 ,
for which, the partial derivatives of the residual with respect to the autoregression coefficients f<sub>s</sub>
for s=0..k-1
should be equated to zero.
This leads to the following system of special equations:
a 0 f 0 + a 1 f 1 + a 2 f 2 + ... + a k-2 f k-2 + a k-1 f k-1 = -a 1 ,
a 1 f 0 + a 0 f 1 + a 1 f 2 + ... + a k-3 f k-2 + a k-2 f k-1 = -a 2 ,
a 2 f 0 + a 1 f 1 + a 0 f 2 + ... + a k-4 f k-2 + a k-3 f k-1 = -a 3 ,
...
a k-2 f 0 + a k-3 f 1 + a k-4 f 2 + ... + a 0 f k-2 + a 1 f k-1 = -a k-1 ,
a k-1 f 0 + a k-2 f 1 + a k-3 f 2 + ... + a 1 f k-2 + a 0 f k-1 = -a k ,
where a is the autocorrelation function (ACF).
The first feature is the view of the matrix on the left side, when the coefficients on the main diagonal and all coefficients on the diagonals equidistant from the main one coincide. Matrices with symmetry of this type are called Toeplitz matrices, and Levinson's algorithm is applicable to them.
The second feature is that the column of free members is formed from the elements of the same matrix a, and the Levinson-Darbin algorithm is applicable to such a system.
The third feature is that actually the Toeplitz matrix is only approximate. For example, for a second-order model, the exact form of the equations is:
<x i-1 2 > f 0 + <x i-1 x i-2 > f 1 = <x i x i-1 >,
<x i-1 x i-2 > f 0 + <x i-2 2 > f 1 = <x i x i-2 >,
where the index i in each sum of the form <u i > runs through the values from 2 to the maximum.
It is easy to notice that:
- The elements on the main diagonal correspond to the values of the zero element of the autocorrelation function, the samples for which are calculated are shifted by one element.
- The elements on the secondary diagonal and the first element in the column of free terms correspond to the values of the first element of the autocorrelation function, and the elements on the diagonal are equal, and the sample for the free term is shifted relative to them.
This can lead to unexpected effects for sequences with a strong increasing (decreasing) trend.
So, for a sample of the first 10 members of the Fibonacci sequence
x i = {1, 1, 2, 3, 5, 8, 13, 21, 34, 55}
it turns out a system of equations of the form:
1869 f 0 + 1155 f 1 = - 3024,
1155 f 0 + 714 f 1 = - 1869
with the correct solution, f 0 = -1, f 1 = -1, but there is no expected Toeplitz symmetry of the problem.
In this case, the use of ACF gives the system
4893 f 0 + 3024 f 1 = - 3024,
3024 f 0 + 4893 f 1 = - 1869,
use of which in this case is illegal. The main method of dealing with these effects is sample centering. But the main thing is to check the possibility of a serious simplification of the model and the use of the Darbin algorithm.
Durbin's algorithm
The essence of Durbin's idea is the search for a solution for a matrix of dimension n + 1 in the form
(f ' 0 , f' 1 , ..., f ' n-1 , f' n ) = (f 0 , f 1 , ..., f n-1 , 0) + beta (f n-1 , f n-2 , ..., f 0 , 1).
Indeed, the substitution of this solution into a matrix of dimension n + 1 , taking into account a system of dimension n, gives:
-a 1 - beta a n + a n beta = -a 1 ,
-a 2 - beta a n-1 + a n-1 beta = -a 2 ,
...
-a n-1 - beta a 1 + a 1 beta = -a n ,
a n f 0 + a n-1 f 1 + a n-2 f 2 + ... + a 2 f n-2 + a 1 f n-1 + beta (a n f n-1 + a n-1 f n-2 + a n-2 f n-3 + ... + a 2 f 1 + a 1 f 0 + a 0 ) = -a n + 1 .
The first n equations of the system are satisfied for any value of beta, the last for
beta = - (a n + 1 + a n f 0 + a n-1 f 1 + a n-2 f 2 + ... + a 2 f n-2 + a 1 f n-1 )
/ (a n f n-1 + a n-1 f n-2 + a n-2 f n-3 + ... + a 2 f 1 + a 1 f 0 + a 0 ).
The durbin()
function returns an array of autoregressive vectors for the entire range of n.
When n = 1, f = (-a 1 / a 0 ), the subsequent coefficient vectors are calculated recurrently (first beta , then f ).
Software implementation
The PHP program has the following functions:
- Centering the
center()
array. - The scalar product of vectors with a shift of the second vector
scalar_prod()
. - Output array with text comment
print_array()
. - Derivation of a system of second-order linear equations and its solutions
print_s()
. - Calculation of the autocorrelation function (ACF)
acf()
. - Comparison of exact and second order Toeplitz systems and their solutions for a given sample of
compare_s()
. - Durbin's algorithm for solving a system of special equations
durbin()
. - Testing the Darbin algorithm
test_durbin()
.
Text of the program
function center(&$arr){ $len = count($arr); $aver = array_sum($arr) / $len; foreach($arr as &$item){ $item -= $aver; } } function scalar_prod($a, $b, $shift = 0, &$c = null){ $scal = 0; if(is_null($c)) $cc = []; else $cc = &$c; foreach($a as $key => $item){ $cc[] = $item * $b[$key+$shift]; $scal += end($cc); } return $scal; } function print_array($arr, $str, $n = 11){ print $str."["; foreach($arr as $key => $item){ if(!(($key+1) % $n)) print "<br>"; printf ("\"%03d\" => %.3f, ", $key, $item); } print "]"; } function print_s($a, $b, $str){ print("<br><br>$str"); printf("<br> %.3f f0 + %.3f f1 = %.3f", $a[0][0], $a[0][1], $b[0]); printf("<br> %.3f f0 + %.3f f1 = %.3f", $a[1][0], $a[1][1], $b[1]); $det = $a[0][0]*$a[1][1] - $a[1][0]*$a[0][1]; $det0 = $b[0]*$a[1][1] - $b[1]*$a[0][1]; $det1 = $a[0][0]*$b[1] - $a[1][0]*$b[0]; printf("<br>Решение: f = [%f, %f]", (float)$det0 / $det, (float)$det1/$det); } function acf($flow, $k, $center = -1, $len = null){ if(is_null($len)){ $len = count($flow); } $slice = array_slice($flow, $k, $len-$k); for($lag = 0; $lag <= $k; $lag++){ $result[$lag] = scalar_prod($slice, $flow, $k-$lag); } if($center != -1){ $denom = 1.0/$result[0]; foreach($result as &$res){ $res *= $denom; } } return $result; } function compare_s($test){ $m = count($test); $acf2 = acf($test, 0, -1, $m-2); $acf1 = acf($test, 1, -1, $m-1); $acf = acf($test, 2); $a_exact = [ [$acf1[0],$acf1[1]], [$acf1[1],$acf2[0]] ]; $a = [ [$acf[0],$acf[1]], [$acf[1],$acf[0]] ]; $b = [-$acf[1], -$acf[2]]; print_s($a_exact, $b, "Контроль симметрии матрицы<br><br>Точная система (порядок 2):"); print_s($a, $b, "Тёплицева система (порядок 2):"); } function durbin($acf, $n){ $ff = []; $f = [-$acf[1]/$acf[0]]; print_array($f, "<br>f = "); $ff[] = $f; for($r = 1; $r < $n; $r++){ $acr = array_reverse(array_slice($acf, 0, $r+1)); $fr = array_reverse($f); $fr[] = 1; $f[] = 0; $beta = - ($acf[$r+1] + scalar_prod($f, $acr))/scalar_prod($fr, $acr); print("<br>beta[$r] = $beta "); $f = array_map(function($a,$b) use($beta){ return $a+$beta*$b; },$f,$fr); $ff[] = $f; print_array($f, "  f = "); } return $ff; } function test_durbin($arr, $n, $center=0){ $len = count($arr); if($center){ center($arr); } $eps_arr = 0; foreach($arr as $item){ $eps_arr += $item*$item; } printf("<br><br>Решение по Дарбину (порядок АР = %d, длина выборки = $len, СКО выборки = %f):", $n, sqrt($eps_arr/($len-1))); $a = acf($arr, $n); print_array($a, "<br><br>АКФ: "); $ff = durbin($a,$n); $c = array_reverse(end($ff)); $eps = 0; $brr = []; for($j=$n; $j<$len; $j++){ $brr[$j] = $arr[$j]+scalar_prod($c, $arr, $j-$n); $eps += pow($brr[$j],2); } $k = count($f)-1; printf("<br>СКО остатка = %f", sqrt($eps/($len-$n))); } $m = 10; $test = [1.0, 1,0]; for ($i = 2; $i < $m; $i++){ $test[$i] = $test[$i-1] + $test[$i-2]; } print("*** Фибоначчи - 10: ***"); compare_s($test); test_durbin($test, 2); print("<br><br>*** Фибоначчи-10 с центрированием: ***"); test_durbin($test, 2, 1); $m=2000; $dpi = 2*M_PI; for($j=0; $j<$m; $j++) $arr[$j] = sin($j);// - $dpi*floor($j/$dpi)); print("<br><br>*** Авторегрессия по Дарбину (синусная выборка): ***"); compare_s($arr); test_durbin($arr, 1); test_durbin($arr, 2); test_durbin($arr, 3); test_durbin($arr, 4); test_durbin($arr, 5); test_durbin($arr, 6);
Results:
*** Fibonacci 10: ***
Matrix symmetry control
Exact system (order 2):
1869.000 f0 + 1155.000 f1 = -3024.000
1155.000 f0 + 714.000 f1 = -1869.000
Solution: f = [-1.000000, -1.000000]
Töplitz system (order 2):
4893.000 f0 + 3024.000 f1 = -3024.000
3024.000 f0 + 4893.000 f1 = -1869.000
Solution: f = [-0.618007, -0.000030]
Durbin solution (order of AP = 2, sample length = 10, MSE of sample = 23.321426):
ACF: ["000" => 4893.000, "001" => 3024.000, "002" => 1869.000,]
f = ["000" => -0.618,]
beta [1] = -2.98035943134E-5 f = ["000" => -0.618, "001" => -0.000,]
RMS residual = 15.285024
*** Fibonacci-10 with centering: ***
Durbin solution (order of AP = 2, sample length = 10, RMS sample = 17.795443):
ACF: ["000" => 2496.320, "001" => 1399.520, "002" => 716.420,]
f = ["000" => -0.561,]
beta [1] = 0.0398418808262 f = ["000" => -0.583, "001" => 0.040,]
RMS residual = 12.412102
*** Durbin autoregression (sinus selection): ***
Matrix symmetry control
Exact system (order 2):
999.018 f0 + 539.793 f1 = -539.750
539.793 f0 + 999.016 f1 = 415.712
Solution: f = [-1.080605, 1.000000]
Töplitz system (order 2):
998.969 f0 + 539.750 f1 = -539.750
539.750 f0 + 998.969 f1 = 415.712
Solution: f = [-1.080619, 1.000008]
Durbin solution (order of AP = 1, sample length = 2000, RMS sample = 0.707169):
ACF: ["000" => 999.677, "001" => 539.750,]
f = ["000" => -0.540,]
RMS residual = 0.595153
Durbin solution (order of AP = 2, sample length = 2000, MSE of the sample = 0.707169):
ACF: ["000" => 998.969, "001" => 539.750, "002" => -415.712,]
f = ["000" => -0.540,]
beta [1] = 1.00000781118 f = ["000" => -1.081, "001" => 1.000,]
RMS residual = 0.000009
Durbin solution (order of AP = 3, sample length = 2000, RMS sample = 0.707169):
ACF: ["000" => 998.142, "001" => 538.985, "002" => -415.712, "003" => -988.206,]
f = ["000" => -0.540,]
beta [1] = 0.999521483275 f = ["000" => -1.080, "001" => 1.000,]
beta [2] = 0.925724185475 f = ["000" => -0.154, "001" => 0.000, "002" => 0.926,]
RMS residual = 0.000488
Durbin solution (order of AP = 4, sample length = 2000, RMS sample = 0.707169):
ACF: ["000" => 998.122, "001" => 538.857, "002" => -415.831, "003" => -988.206, "004" => -652.029,]
f = ["000" => -0.540,]
beta [1] = 0.999342177677 f = ["000" => -1.079, "001" => 0.999,]
beta [2] = 0.925843078378 f = ["000" => -0.154, "001" => 0.000, "002" => 0.926,]
beta [3] = 6.00603640925 f = ["000" => 5.406, "001" => 0.000, "002" => 0.000, "003" => 6.006,]
RMS residual = 0.004358
Durbin solution (order of AP = 5, sample length = 2000, RMS sample = 0.707169):
ACF: ["000" => 997.550, "001" => 538.964, "002" => -415.143, "003" => -987.569, "004" => -652.029, "005" => 282.984,]
f = ["000" => -0.540,]
beta [1] = 0.999977661062 f = ["000" => -1.081, "001" => 1.000,]
beta [2] = 0.925422594935 f = ["000" => -0.155, "001" => 0.000, "002" => 0.925,]
beta [3] = 5.96426000454 f = ["000" => 5.364, "001" => 0.000, "002" => 0.000, "003" => 5.964,]
beta [4] = -1.11184318911 f = ["000" => -1.267, "001" => -0.000, "002" => 0.000, "003" => 0.000, "004" => -1.112,]
RMS residual = 0.000027
Durbin solution (order of AP = 6, sample length = 2000, RMS sample = 0.707169):
ACF: ["000" => 996.630, "001" => 538.238, "002" => -415.008, "003" => -986.697, "004" => -651.222, "005" => 282.984, "006 "=> 957.015,]
f = ["000" => -0.540,]
beta [1] = 0.999627453765 f = ["000" => -1.080, "001" => 1.000,]
beta [2] = 0.925654012051 f = ["000" => -0.155, "001" => 0.000, "002" => 0.926,]
beta [3] = 5.98719502289 f = ["000" => 5.387, "001" => 0.000, "002" => 0.000, "003" => 5.987,]
beta [4] = -1.11131942365 f = ["000" => -1.266, "001" => -0.000, "002" => -0.000, "003" => 0.000, "004" => -1.111,]
beta [5] = -0.877666481891 f = ["000" => -0.291, "001" => -0.000, "002" => -0.000, "003" => 0.000, "004" => -0.000, " 005 "=> -0.878,]
RMS residual = 0.000360
The program showed high efficiency on a limited (sinus) sample with a large amount of input data.