There is a source DF:

FIRM x1 x2 1 4 4 1 2 34 1 3 4 1 4 4 2 4 4 2 4 4 2 4 4 2 2 4 3 3 3 3 2 3 3 2 2 

It is necessary to take each of the FIRMs, take the correlation between x1 and x2, and output to the new DF.

That is, at the output you need to get something like this:

  FIRM CORR 1 ….. 2 ….. 3 …. 
  • take every what? - Viktorov
  • Each unique value from the FIRM column - Max52

1 answer 1

If I understood correctly:

 In [116]: from scipy.stats import pearsonr In [117]: df.groupby('FIRM')[['x1','x2']].apply(lambda x: pearsonr(x['x1'], x['x2'])[0]) ...\Anaconda3\envs\ml\lib\site-packages\scipy\stats\stats.py:3038: RuntimeWarning: invalid value encountered in double_scalars r = r_num / r_den Out[117]: FIRM 1 -0.870388 2 NaN 3 0.500000 dtype: float64 
  • Yes that's right. But how to display the value defined? that is, the correlation between x1 and x2, and not the whole 2 * 2 label. Because then I would like to count the average correlation. - Max52 5:24 pm
  • Throws an error (SyntaxError: unexpected character after line continuation character - Max52