[SOLVED] Get count_values(normalize=True) result for each value, in each group in pandas GroupByDataframe

Issue

This Content is from Stack Overflow. Question asked by CutePoison

Say I have a dataframe

import pandas as pd
df = pd.DataFrame({"id":[1,1,1,1,2,2,2],
                  "car":["Volvo","Audi","Volvo","Volvo","VW","Audi","Volvo"]})

and I want to get the count-ratio for each car within each id aswell as the count i.e the resulting dataframe would be


          ratio    count
    car
id 

1   Audi   0.25     1
    Volvo  0.75     3

2   Audi   0.33     1
    Volvo  0.33     1
    VW     0.33     1

I have tried


def get_tag_ratio(x):
    """
    Calculate the ratio for each car
    """
    ratio = x.value_counts(normalize=True)
    return ratio

car_info = df.groupby(["id"])["car"].agg(
    ratio=get_tag_ratio, count="count")

but the result is

          ratio    count
 
id 

1   [0.75,0.25]          4

2   [0.33,0.33,0.33]     3

I have also tried

car_info = df.groupby(["id","car"])["car"].agg(
    ratio=get_tag_ratio, count="count")

which gives

          ratio    count
    
id   car

1   Audi   1.0      1
    Volvo  1.0      3

2   Audi   1.0      1
    Volvo  1.0      1
    VW     1.0      1

which is almost there – unless the value_counts now is applied within each car group and not each id group.

Just parsing "value_counts" doens’t work (also, I don’t know how to get “normalize=True” as an argument to value_counts here)

df.groupby("id")["car"].agg(["value_counts","count"]) #`ValueError: Must pass non-zero number of levels/codes`



Solution

Use Series.value_counts without parameter and with parameter normalize with DataFrame cosntructor:

def get_tag_ratio(x):
    """
    Calculate the ratio for each car
    """
    ratio = pd.DataFrame({'ratio': x.value_counts(normalize=True), 
                          'count': x.value_counts()})
    return ratio

car_info = df.groupby(["id"])["car"].apply(get_tag_ratio)
print (car_info)
             ratio  count
id                       
1  Volvo  0.750000      3
   Audi   0.250000      1
2  Audi   0.333333      1
   VW     0.333333      1
   Volvo  0.333333      1

Or join 2 MultiIndex Series:

g = df.groupby("id")["car"]
car_info = pd.concat([g.value_counts(normalize=True),
                      g.value_counts()], axis=1, keys=['ratio','count'])
print (car_info)
             ratio  count
id car                   
1  Volvo  0.750000      3
   Audi   0.250000      1
2  Audi   0.333333      1
   VW     0.333333      1
   Volvo  0.333333      1


This Question was asked in StackOverflow by CutePoison and Answered by jezrael It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?