[SOLVED] How to get the max value of each column for each group in a dataframe?

Issue

This Content is from Stack Overflow. Question asked by Dana

I have the following dataframe:

  X      Y      Z      A      B

1 7.9 0.2 0.6 0.3 0.75
2 8.9 0.2 0.1 0.8 0.60
3 8.9 0.2 0.9 0.0 0.45
4 4.3 0.3 2.3 0.3 0.35
5 7.8 0.3 2.0 0.6 0.36
6 8.9 0.3 1.3 0.4 0.34
7 1.2 3.4 4.0 3.0 3.50
8 7.8 9.0 0.0 9.0 9.50

The dataframe is grouped using the ‘Y’ column. In each group, I would like to get the average value for ‘X’ and max for ‘Z’, ‘A’, and ‘B’.

The result should look like this:

  X      Y      Z       A      B

0 8.56 0.2 0.9 0.8 0.75
1 7.00 0.3 2.3 0.6 0.36
2 1.20 3.4 4.0 3.0 3.50
3 7.80 9.0 0.0 9.0 9.50

I would appreciate any help!



Solution

Assuming you’re talking about a pandas dataframe you can use df.agg, specifying an aggregation function for each column.

import pandas as pd

df = pd.DataFrame({'X': [7.9, 8.9, 8.9, 4.3, 7.8, 8.9, 1.2, 7.8],
                   'Y': [0.2, 0.2, 0.2, 0.3, 0.3, 0.3, 3.4, 9.0],
                   'Z': [0.6, 0.1, 0.9, 2.3, 2.0, 1.3, 4.0, 0.0],
                   'A': [0.3, 0.8, 0.0, 0.3, 0.6, 0.4, 3.0, 9.0],
                   'B': [0.75, 0.6, 0.45, 0.35, 0.36, 0.34, 3.5, 9.5]}))

df2 = df.groupby("Y", as_index=False).agg({"X":"mean",
                                           "Z":"max", 
                                           "A":"max", 
                                           "B":"max"})

gives

>>> df2
     Y         X    Z    A     B
0  0.2  8.566667  0.9  0.8  0.75
1  0.3  7.000000  2.3  0.6  0.36
2  3.4  1.200000  4.0  3.0  3.50
3  9.0  7.800000  0.0  9.0  9.50


This Question was asked in StackOverflow by Dana and Answered by fsimonjetz It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?