Given data of 374 rows x 31 columns. The first column is the date, the remaining columns are the stock prices of 30 companies. I need to apply the principal component method. For this, I wrote the following code:

import numpy as np import pandas as pd Location1 = r'C:\Users\...\close_prices.csv' df = pd.read_csv(Location1) from sklearn.decomposition import PCA X = df.drop('date', 1) pca = PCA(n_components=10) pca.fit(X) print(pca.explained_variance_ratio_) # первая компонента объясняет больше всего вариации признаков (цены 30-ти компаний) # теперь применяю преобразование к исходным данным X1 = pca.transform(X) X1 Out[7]: array([[-50.90240358, -17.63167724, -7.7360209 , ..., 3.55657041, -5.82197358, -1.72604005], [-52.84690919, -19.14690749, -7.27254551, ..., 3.43259929, -5.63318106, -2.0122316 ], X1.shape # (374, 10) # необходимо взять первую компоненту и рассчитать коэфициент корреляции Пирсона для Индекса Доу Джонса размерностью (374, 1) => я беру (374, 1) X11 = X1[:,[0]] X11.shape # (374,1) 

But I can not calculate the coefficient as the numbers are negative in X1. Therefore, when taking the root and dividing the matrix is ​​obtained with nan.

Why, after applying the trained model to X, a matrix with negative values ​​is obtained?

    1 answer 1

    And what prevents to multiply the result by -1? PCA identifies directions in the space of attributes, while the orientation of the eigenvectors defining these directions does not play a special role.

    • True, but I also have positive data, so negative values ​​will remain. I solved the problems, but not fully understanding how. @ q-dad link if it is interesting, the answer to the problem is there - user21