Given data of 374 rows x 31 columns. The first column is the date, the remaining columns are the stock prices of 30 companies. I need to apply the principal component method. For this, I wrote the following code:
import numpy as np import pandas as pd Location1 = r'C:\Users\...\close_prices.csv' df = pd.read_csv(Location1) from sklearn.decomposition import PCA X = df.drop('date', 1) pca = PCA(n_components=10) pca.fit(X) print(pca.explained_variance_ratio_) # первая компонента объясняет больше всего вариации признаков (цены 30-ти компаний) # теперь применяю преобразование к исходным данным X1 = pca.transform(X) X1.shape # (374, 10) # необходимо взять первую компоненту => я беру (374, 1) X11 = X1[:,0] X11.shape # (374,)
The error occurs when I want to calculate the Pearson correlation coefficient
df2 = pd.read_csv('djia_index.csv') X2 = df2.drop('date', 1) X2.shape #(374, 1) from numpy import corrcoef corr1 = corrcoef(X2, X11) ValueError: all the input array dimensions except for the concatenation axis must match exactly
Why the dimension does not match? how to fix it?