There is such a code. He is looking for correlations among various stocks of Yahoo finance. The only thing that he does not know how to do is to build correlations for a variety of instruments (6 pieces will build, and 10 will not be able to). The question is how to teach? For example, I want to use stock tickets not in the format ('SAN.MC', 'GAM.MC', 'BBVA.MC' ..., but so that they are recorded by the file.

import pandas as pd from pandas_datareader import data import datetime import numpy as np from bokeh.plotting import figure, show from bokeh.palettes import Spectral6 from bokeh.io import output_notebook output_notebook() start = datetime.datetime(2016, 1, 1) end = datetime.datetime(2016, 12, 31) symbols = ('SAN.MC', 'GAM.MC', 'BBVA.MC', 'GAS.MC', 'ENG.MC', 'REP.MC') prices_df = pd.DataFrame() for symbol in symbols: df = data.DataReader(symbol, 'yahoo', start, end) prices_df.loc[:, symbol] = df['Adj Close'] prices_df.tail() numlines=len(prices_df.columns) mypalette=Spectral6[0:numlines] # график p = figure(width=1000, height=600, x_axis_type="datetime") color_ix = 0 for symbol in symbols: p.line(prices_df.index.values, prices_df[symbol].values, legend=symbol, line_color=mypalette[color_ix], line_width=2) color_ix += 1 show(p) # corr_df = prices_df.corr(method='pearson') corr_df 

  • I also did not understand the minus score ... Now I’ll try to sketch the answer - MaxU

1 answer 1

Here is the answer to your main question:

I can't read a lot of tools

 import pandas as pd import pandas_datareader.data as wb # читаем файл со списком инструментов в список symbols_fn = r'C:\Temp\.data\631336\symbols.txt' with open(symbols_fn) as f: stocklist = f.read().splitlines() start = '2016-01-01' end = '2016-12-31' # читаем все "инструменты" одной коммандой в Pandas.Panel x = wb.DataReader(stocklist, 'yahoo', start, end) # 2D срез (DataFrame) по 'Adj Close', параллельно освобождаем память: x = x.loc['Adj Close'] print(x) 

Result:

 In [5]: x Out[5]: AAPL ADI ADM AIV AMAT ANTM AON APA APC APH ARNC Date 2016-01-04 102.612183 52.665440 34.491400 38.079797 18.117888 136.506876 89.304015 43.361861 48.950834 50.392948 21.447743 2016-01-05 100.040792 52.278481 34.876350 39.049798 18.137507 138.497457 89.845493 42.483497 48.343058 49.828294 20.475857 2016-01-06 98.083025 50.053454 34.029462 38.608015 17.391996 135.114436 88.486878 37.603701 43.610380 48.649455 19.018028 2016-01-07 93.943473 48.776478 33.221070 38.387124 16.881909 132.800261 87.197185 35.671303 39.963728 46.896053 18.267028 2016-01-08 94.440222 48.350822 33.095960 37.657219 16.469917 129.564352 85.671207 35.768901 40.412088 46.192713 17.825261 2016-01-11 95.969420 49.502031 32.874615 37.897319 16.362014 125.867552 85.070668 34.217126 37.612334 45.836088 17.670643 ... 

PS I recommend storing data in HDF5 - very quickly, it stores information about data types ( dtypes ), allows you to read data selectively (that is, filter when reading from a disk - indispensable when working with data sets that do not fit entirely in memory), supports compression, widespread format, etc.

Example:

 # запись x.to_hdf(r'C:\Temp\.data\631336\data_panel.h5', 'my_id', format='t', complib='blosc', complevel=5, data_columns=True) # чтение df = pd.read_hdf(r'C:\Temp\.data\631336\data_panel.h5', 'my_id') 

Result:

 In [10]: df.shape Out[10]: (252, 11) In [11]: df.dtypes Out[11]: AAPL float64 ADI float64 ADM float64 AIV float64 AMAT float64 ANTM float64 AON float64 APA float64 APC float64 APH float64 ARNC float64 dtype: object