Search for cointegration in python

Question

Share tickers are taken from the txt file. It is necessary to calculate the coefficient of covariance, for example, for AAPL and TXN, it turned out to be 0.74. And when he assigned to take shares from one file, instead of two different S1, S2, he refuses to work. How to make it count the coefficient for all stocks from txt?

import pandas as pd import pandas_datareader.data as wb import datetime import numpy as np import statsmodels from statsmodels.tsa.stattools import coint from datetime import date from pandas_datareader import data symbols_fn = r'C:\Temp\.data\631336\symbols.txt' with open(symbols_fn) as f: stocklist = f.read().splitlines() start = '2016-01-01' end = '2016-12-31' S1, S2 = wb.DataReader(stocklist, 'yahoo', start, end) result = coint(S1[['Close']], S2[['Close']]) score = result[0] pvalue = result[1] score, pvalue, _ = coint(S1[['Close']], S2[['Close']]) print(pvalue)

If I correctly understood the source code statsmodel.tsa.stattools.coint - it expects vectors to be input (1D arrays), i.e.
to feed this function immediately a lot of tools will not work.

Accepted Answer · 2017-02-23T20:20:02

Something like this:

 import pandas as pd import pandas_datareader.data as wb from statsmodels.tsa.stattools import adfuller, coint from itertools import combinations # read up financial data ('Close') for all tickers symbols_fn = r'D:\download\9017879_symbol.txt' with open(symbols_fn) as f: stocklist = f.read().splitlines() start = '2016-01-01' end = '2016-12-31' x = wb.DataReader(stocklist, 'yahoo', start, end).loc['Close'] # generate pairs (combinations of pairs for all tickers) pairs = list(combinations(x.columns.tolist(), 2)) data = [] # calculate cointegrations for each pair of tickers for a,b in pairs: data.append(coint(x[a], x[b])) # build a Pandas.DataFrame based on the result of cointegration df = pd.DataFrame( data, columns=['coint_t','pvalue','crit_value'], index=pd.MultiIndex.from_tuples(pairs, names=['a', 'b']) )

Result:

 In [14]: df Out[14]: coint_t pvalue crit_value ab AAPL ADI -2.399547 0.325069 [-3.45656889661, -2.87307861944, -2.57291899534] ADM -1.766708 0.645747 [-3.45656889661, -2.87307861944, -2.57291899534] ADP -1.714724 0.670144 [-3.45656889661, -2.87307861944, -2.57291899534] ADSK -2.441990 0.305234 [-3.45656889661, -2.87307861944, -2.57291899534] AIV -1.658111 0.695751 [-3.45656889661, -2.87307861944, -2.57291899534] AIZ -1.730617 0.662771 [-3.45656889661, -2.87307861944, -2.57291899534] AJG -1.850511 0.604879 [-3.45656889661, -2.87307861944, -2.57291899534] AMAT -2.350225 0.348777 [-3.45656889661, -2.87307861944, -2.57291899534] AN -1.526342 0.750953 [-3.45656889661, -2.87307861944, -2.57291899534] ANTM -1.211491 0.854866 [-3.45656889661, -2.87307861944, -2.57291899534] ... ... ... ... BHI BK -3.528199 0.029976 [-3.45656889661, -2.87307861944, -2.57291899534] BLL -1.032398 0.896528 [-3.45656889661, -2.87307861944, -2.57291899534] T -0.587248 0.957508 [-3.45656889661, -2.87307861944, -2.57291899534] TXN -1.805689 0.626955 [-3.45656889661, -2.87307861944, -2.57291899534] BK BLL -1.144756 0.871779 [-3.45656889661, -2.87307861944, -2.57291899534] T -0.954469 0.911148 [-3.45656889661, -2.87307861944, -2.57291899534] TXN -1.838958 0.610613 [-3.45656889661, -2.87307861944, -2.57291899534] BLL T -1.960588 0.548994 [-3.45656889661, -2.87307861944, -2.57291899534] TXN -2.924919 0.129148 [-3.45656889661, -2.87307861944, -2.57291899534] T TXN -1.891437 0.584344 [-3.45656889661, -2.87307861944, -2.57291899534] [465 rows x 3 columns]

UPDATE: results for three instruments: ['AAPL', 'AME', 'GOOG'] :

 In [11]: df Out[11]: coint_t pvalue crit_value ab AAPL AME -1.310313 0.826602 [-3.45656889661, -2.87307861944, -2.57291899534] GOOG -4.043216 0.006225 [-3.45656889661, -2.87307861944, -2.57291899534] AME GOOG -3.734957 0.016513 [-3.45656889661, -2.87307861944, -2.57291899534]

Check:

 In [13]: coint(x['AAPL'], x['AME']) Out[13]: (-1.3103131127615668, 0.82660200322788824, array([-3.4565689 , -2.87307862, -2.572919 ])) In [14]: coint(x['AAPL'], x['GOOG']) Out[14]: (-4.0432156294166424, 0.0062253442468271384, array([-3.4565689 , -2.87307862, -2.572919 ])) In [15]: coint(x['AME'], x['GOOG']) Out[15]: (-3.734956880459813, 0.016513005631255184, array([-3.4565689 , -2.87307862, -2.572919 ]))

@nabafew, everything counts correctly for me - see UPDATE - MaxU

MaxU MaxU 52.2k 6 18 50 · Answer 2 · 2017-02-24T16:34:00

Data:

 x = wb.DataReader(stocklist, 'yahoo', start='1/1/2016', end='1/12/2016').loc['Close'] In [50]: x Out[50]: AAPL AME GOOG Date 2016-01-04 105.349998 52.650002 741.840027 2016-01-05 102.709999 52.380001 742.580017 2016-01-06 100.699997 51.189999 743.619995 2016-01-07 96.449997 49.980000 726.390015 2016-01-08 96.959999 49.070000 714.469971 2016-01-11 98.529999 48.200001 716.030029 2016-01-12 99.959999 48.470001 726.070007

Decision:

 In [51]: # generate pairs (combinations of pairs for all tickers) ...: pairs = list(combinations(x.columns.tolist(), 2)) ...: ...: data = [] ...: ...: # calculate cointegrations for each pair of tickers ...: for a,b in pairs: ...: data.append(coint(x[a], x[b])) ...: ...: # build a Pandas.DataFrame based on the result of cointegration ...: df = pd.DataFrame( ...: data, ...: columns=['coint_t','pvalue','crit_value'], ...: index=pd.MultiIndex.from_tuples(pairs, names=['a', 'b']) ...: ) ...:

Result:

 In [52]: df Out[52]: coint_t pvalue crit_value ab AAPL AME -1.303168 0.828778 [-4.93869023324, -3.47758285714, -2.84386795918] GOOG -2.012823 0.521906 [-4.93869023324, -3.47758285714, -2.84386795918] AME GOOG -1.420313 0.790466 [-4.93869023324, -3.47758285714, -2.84386795918]

Compare:

 In [53]: coint(y1, x1) Out[53]: (-1.4203126278701015, 0.79046585398440861, array([-4.93869023, -3.47758286, -2.84386796]))

Search for cointegration in python

2 answers 2

More articles: