When solving a statistical problem, I get a CSV table with approximately the following data:

expr Therapy 100 A 98 B 87 C 97 D 102 C 96 B 92 D 88 A 

I transform it into a dataframe, so that from there I can draw graphics through matplotlib. But before that, I need to calculate statistics for the four samples (A, B, C, D). Is it possible to transform the original table structure in pandas and get it?

 ABCD 100 98 87 97 88 96 102 92 

Or is it easier to first read the csv into a variable, make a dictionary, count all the indicators in it (median, intragroup square, etc.), and then send everything to the dataframe?

    2 answers 2

    Try this:

     In [7]: res = (df.assign(idx=df.groupby('Therapy').cumcount()) .pivot_table(index='idx', columns='Therapy', values='expr', aggfunc='sum')) In [8]: res Out[8]: Therapy ABCD idx 0 100 98 87 97 1 88 96 102 92 

    Step by Step:

    in order to use the DataFrame.pivot_table () method, we will need values ​​that will act as an index of rows in the resulting sample — create a new idx column “on the fly”:

     In [10]: df.assign(idx=df.groupby('Therapy').cumcount()) Out[10]: expr Therapy idx 0 100 A 0 1 98 B 0 2 87 C 0 3 97 D 0 4 102 C 1 5 96 B 1 6 92 D 1 7 88 A 1 

    Now you can use pivot_table() :

     In [11]: df.assign(idx=df.groupby('Therapy').cumcount()).pivot_table(index='idx', columns='Therapy', values='expr', aggfunc='sum') Out[11]: Therapy ABCD idx 0 100 98 87 97 1 88 96 102 92 
    • Can I have a couple more comments on how this works? - Viktorov
    • @Viktorov, supplemented the answer - MaxU
    • one
      thanks for the clarification) - Viktorov
    • @MaxU, thanks for the two options. Both work, but I do not fully understand. There is something to figure out! ) - Burtsev
    • @Burtsev, please! If you have questions, ask them on SO :) - MaxU

    In order to calculate various statistics grouped by Therapy not necessary to deploy the DataFrame:

     In [20]: df.groupby('Therapy')['expr'].agg(['median', 'sum', 'std']) Out[20]: median sum std Therapy A 94.0 188 8.485281 B 97.0 194 1.414214 C 94.5 189 10.606602 D 94.5 189 3.535534