Something like this:
import pandas as pd import pandas_datareader.data as wb from statsmodels.tsa.stattools import adfuller, coint from itertools import combinations # read up financial data ('Close') for all tickers symbols_fn = r'D:\download\9017879_symbol.txt' with open(symbols_fn) as f: stocklist = f.read().splitlines() start = '2016-01-01' end = '2016-12-31' x = wb.DataReader(stocklist, 'yahoo', start, end).loc['Close'] # generate pairs (combinations of pairs for all tickers) pairs = list(combinations(x.columns.tolist(), 2)) data = [] # calculate cointegrations for each pair of tickers for a,b in pairs: data.append(coint(x[a], x[b])) # build a Pandas.DataFrame based on the result of cointegration df = pd.DataFrame( data, columns=['coint_t','pvalue','crit_value'], index=pd.MultiIndex.from_tuples(pairs, names=['a', 'b']) )
Result:
In [14]: df Out[14]: coint_t pvalue crit_value ab AAPL ADI -2.399547 0.325069 [-3.45656889661, -2.87307861944, -2.57291899534] ADM -1.766708 0.645747 [-3.45656889661, -2.87307861944, -2.57291899534] ADP -1.714724 0.670144 [-3.45656889661, -2.87307861944, -2.57291899534] ADSK -2.441990 0.305234 [-3.45656889661, -2.87307861944, -2.57291899534] AIV -1.658111 0.695751 [-3.45656889661, -2.87307861944, -2.57291899534] AIZ -1.730617 0.662771 [-3.45656889661, -2.87307861944, -2.57291899534] AJG -1.850511 0.604879 [-3.45656889661, -2.87307861944, -2.57291899534] AMAT -2.350225 0.348777 [-3.45656889661, -2.87307861944, -2.57291899534] AN -1.526342 0.750953 [-3.45656889661, -2.87307861944, -2.57291899534] ANTM -1.211491 0.854866 [-3.45656889661, -2.87307861944, -2.57291899534] ... ... ... ... BHI BK -3.528199 0.029976 [-3.45656889661, -2.87307861944, -2.57291899534] BLL -1.032398 0.896528 [-3.45656889661, -2.87307861944, -2.57291899534] T -0.587248 0.957508 [-3.45656889661, -2.87307861944, -2.57291899534] TXN -1.805689 0.626955 [-3.45656889661, -2.87307861944, -2.57291899534] BK BLL -1.144756 0.871779 [-3.45656889661, -2.87307861944, -2.57291899534] T -0.954469 0.911148 [-3.45656889661, -2.87307861944, -2.57291899534] TXN -1.838958 0.610613 [-3.45656889661, -2.87307861944, -2.57291899534] BLL T -1.960588 0.548994 [-3.45656889661, -2.87307861944, -2.57291899534] TXN -2.924919 0.129148 [-3.45656889661, -2.87307861944, -2.57291899534] T TXN -1.891437 0.584344 [-3.45656889661, -2.87307861944, -2.57291899534] [465 rows x 3 columns]
UPDATE: results for three instruments: ['AAPL', 'AME', 'GOOG'] :
In [11]: df Out[11]: coint_t pvalue crit_value ab AAPL AME -1.310313 0.826602 [-3.45656889661, -2.87307861944, -2.57291899534] GOOG -4.043216 0.006225 [-3.45656889661, -2.87307861944, -2.57291899534] AME GOOG -3.734957 0.016513 [-3.45656889661, -2.87307861944, -2.57291899534]
Check:
In [13]: coint(x['AAPL'], x['AME']) Out[13]: (-1.3103131127615668, 0.82660200322788824, array([-3.4565689 , -2.87307862, -2.572919 ])) In [14]: coint(x['AAPL'], x['GOOG']) Out[14]: (-4.0432156294166424, 0.0062253442468271384, array([-3.4565689 , -2.87307862, -2.572919 ])) In [15]: coint(x['AME'], x['GOOG']) Out[15]: (-3.734956880459813, 0.016513005631255184, array([-3.4565689 , -2.87307862, -2.572919 ]))
statsmodel.tsa.stattools.coint- it expects vectors to be input (1D arrays), i.e. to feed this function immediately a lot of tools will not work. Probably have to write a cycle and count in pairs ... - MaxU