Given data of 374 rows x 31 columns. The first column is the date, the remaining columns are the stock prices of 30 companies. I need to apply the principal component method. For this, I wrote the following code:
import numpy as np import pandas as pd Location1 = r'C:\Users\...\close_prices.csv' df = pd.read_csv(Location1) from sklearn.decomposition import PCA X = df.drop('date', 1) pca = PCA(n_components=10) pca.fit(X) print(pca.explained_variance_ratio_) # первая компонента объясняет больше всего вариации признаков (цены 30-ти компаний) # теперь применяю преобразование к исходным данным X1 = pca.transform(X) X1 Out[7]: array([[-50.90240358, -17.63167724, -7.7360209 , ..., 3.55657041, -5.82197358, -1.72604005], [-52.84690919, -19.14690749, -7.27254551, ..., 3.43259929, -5.63318106, -2.0122316 ], X1.shape # (374, 10) # необходимо взять первую компоненту и рассчитать коэфициент корреляции Пирсона для Индекса Доу Джонса размерностью (374, 1) => я беру (374, 1) X11 = X1[:,[0]] X11.shape # (374,1) But I can not calculate the coefficient as the numbers are negative in X1. Therefore, when taking the root and dividing the matrix is obtained with nan.
Why, after applying the trained model to X, a matrix with negative values is obtained?