Issue
This Content is from Stack Overflow. Question asked by Dana
I have the following dataframe:
X Y Z A B
1 7.9 0.2 0.6 0.3 0.75
2 8.9 0.2 0.1 0.8 0.60
3 8.9 0.2 0.9 0.0 0.45
4 4.3 0.3 2.3 0.3 0.35
5 7.8 0.3 2.0 0.6 0.36
6 8.9 0.3 1.3 0.4 0.34
7 1.2 3.4 4.0 3.0 3.50
8 7.8 9.0 0.0 9.0 9.50
The dataframe is grouped using the ‘Y’ column. In each group, I would like to get the average value for ‘X’ and max for ‘Z’, ‘A’, and ‘B’.
The result should look like this:
X Y Z A B
0 8.56 0.2 0.9 0.8 0.75
1 7.00 0.3 2.3 0.6 0.36
2 1.20 3.4 4.0 3.0 3.50
3 7.80 9.0 0.0 9.0 9.50
I would appreciate any help!
Solution
Assuming you’re talking about a pandas dataframe you can use df.agg
, specifying an aggregation function for each column.
import pandas as pd
df = pd.DataFrame({'X': [7.9, 8.9, 8.9, 4.3, 7.8, 8.9, 1.2, 7.8],
'Y': [0.2, 0.2, 0.2, 0.3, 0.3, 0.3, 3.4, 9.0],
'Z': [0.6, 0.1, 0.9, 2.3, 2.0, 1.3, 4.0, 0.0],
'A': [0.3, 0.8, 0.0, 0.3, 0.6, 0.4, 3.0, 9.0],
'B': [0.75, 0.6, 0.45, 0.35, 0.36, 0.34, 3.5, 9.5]}))
df2 = df.groupby("Y", as_index=False).agg({"X":"mean",
"Z":"max",
"A":"max",
"B":"max"})
gives
>>> df2
Y X Z A B
0 0.2 8.566667 0.9 0.8 0.75
1 0.3 7.000000 2.3 0.6 0.36
2 3.4 1.200000 4.0 3.0 3.50
3 9.0 7.800000 0.0 9.0 9.50
This Question was asked in StackOverflow by Dana and Answered by fsimonjetz It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.