How to restore the missing matrix values ​​in Python, knowing that vectors are linearly dependent? For example, in this example, the 2nd vector is 2 times larger than the first one, and the 3rd one is greater than the 2nd vector by 10 (but in fact we do not know this dependence).

import numpy as np nan = np.NaN data = np.array([[1, nan, 5, 6, nan, 20], [2, nan, nan, nan, 4, nan], [20, nan, 100, 120, 40, nan]]) 

Expected result: the 2nd column can be replaced, possibly based on the distribution for each vector (intuitively present the result as it is)

 [[ 1. 7 5. 6. 2. 20.] [ 2. 14 10. 12. 4. 40.] [ 20. 140 100. 120. 40. 400.]] 

The matrix may be larger, and the linear relationship between the vectors may not be as clearly expressed as in this example, therefore a universal solution is required.

This requires:

  1. find dependency coefficients
  2. fill the gaps
  3. interpolate remaining gaps

tell me how to do it?

  • Unfortunately, I did not understand how you filled the 2nd column - Yernar

2 answers 2

You can try this:

First, create a Pandas DataFrame for convenience:

 import pandas as pd # pip install pandas In [24]: df = pd.DataFrame(data) 

It turned out the following DF:

 In [25]: df Out[25]: 0 1 2 3 4 5 0 1.0 NaN 5.0 6.0 NaN 20.0 1 2.0 NaN NaN NaN 4.0 NaN 2 20.0 NaN 100.0 120.0 40.0 NaN 

now apply the known rules of linear dependencies:

 In [30]: df.loc[1].fillna(df.loc[0]*2, inplace=True) In [31]: df.loc[2].fillna(df.loc[1]*10, inplace=True) In [32]: df.loc[0].fillna(df.loc[1]/2, inplace=True) 

we will receive:

 In [33]: df Out[33]: 0 1 2 3 4 5 0 1.0 NaN 5.0 6.0 2.0 20.0 1 2.0 NaN 10.0 12.0 4.0 40.0 2 20.0 NaN 100.0 120.0 40.0 400.0 

finally interpolate the DataFrame in rows to get rid of the remaining unknowns:

 In [36]: df.interpolate(axis=1, inplace=True) In [37]: df Out[37]: 0 1 2 3 4 5 0 1.0 3.0 5.0 6.0 2.0 20.0 1 2.0 6.0 10.0 12.0 4.0 40.0 2 20.0 60.0 100.0 120.0 40.0 400.0 
  • And if there is a lot of data? and this dependence is not as strict as in the example? and if we just know that the lines are dependent on each other, but we don’t know in what proportion? there are some kind of methods like stackoverflow.com/questions/32228374/ ... this is good that I gave such an example), but it meant that we just know that there is a dependency , and we don’t know what it is - Ste_kd 8:39 pm
  • You have a question how to interpolate a 2D matrix or how to find dependency coefficients? - MaxU
  • How to interpol a linearly dependent matrix for which we do not know the true dependence of their row vectors - Ste_kd 8:50 pm
  • @Ste_kd, IMO, it makes sense to split the task into several: 1) find the dependency coefficients if you know that the dependencies are present. 2) use the information received to more accurately fill in the blanks. 3) use interpolation to fill in the remaining gaps - MaxU
  • indicated in the question, but vseravno not vkurse how to do it, if not you, as always, no one will help) - Ste_kd
 import numpy as np nan = np.nan p1 = 2 p2 = 10 data = np.array([[1, nan, 5, 6, nan, 10], [2, nan, nan, nan, 4, nan], [20, nan, 100, 120, 40, nan]]) for idx, i in enumerate(data[0]): if not np.isnan(i): data[1][idx] = i * p1 data[2][idx] = i * p1 * p2; for idx, i in enumerate(data[1]): if not np.isnan(i): data[0][idx] = i / p1 data[2][idx] = i * p2; for idx, i in enumerate(data[2]): if not np.isnan(i): data[1][idx] = i / p2 data[0][idx] = i / p1 / p2; 
  • But what about the 2nd column? Probably, it should be formed on the basis of the distribution of the vectors themselves. I think that there is a better solution to this problem. - Ste_kd
  • @Ste_kd, 2nd column can not be restored, since there everything is NaN - Yernar
  • You can, if you use algorithms for interpolation. Your solution is not entirely suitable, it is only suitable for a specific example and not for all columns of the matrix. - Ste_kd