There is a df - DataFrame pandas. It has two columns. In the column "A" is a variable, in the column "B" is a class label.

How to replace the values ​​in the column "A" by the average values ​​for the corresponding class?

It’s easy to calculate the mean values ​​themselves:

df.groupby('B')['A'].mean() 

But now how to replace all the values ​​in the "A" column with the calculated averages?

    1 answer 1

    Use the GroupBy.transform () function.

    Example:

    source DF:

     In [36]: df = pd.DataFrame({'A':np.random.randint(0, 10, 10), 'B':np.random.choice(list('XYZ'), 10)}) In [37]: df Out[37]: AB 0 8 Y 1 8 Z 2 3 Z 3 1 Y 4 3 Y 5 5 Z 6 7 Z 7 1 X 8 4 X 9 2 Y 

    Decision:

     In [39]: df['avg_A'] = df.groupby('B')['A'].transform('mean') 

    Result:

     In [40]: df Out[40]: AB avg_A 0 8 Y 3.50 1 8 Z 5.75 2 3 Z 5.75 3 1 Y 3.50 4 3 Y 3.50 5 5 Z 5.75 6 7 Z 5.75 7 1 X 2.50 8 4 X 2.50 9 2 Y 3.50 

    if you need to replace the values ​​in column A :

     df['A'] = df.groupby('B')['A'].transform('mean')