Get count_values(normalize=True) result for each value, in each group in pandas GroupByDataframe

Issue

This Content is from Stack Overflow. Question asked by CutePoison

Say I have a dataframe

import pandas as pd
df = pd.DataFrame({"id":[1,1,1,1,2,2,2],
                  "car":["Volvo","Audi","Volvo","Volvo","VW","Audi","Volvo"]})

and I want to get the count-ratio for each car within each id aswell as the count i.e the resulting dataframe would be


          ratio    count
    car
id 

1   Audi   0.25     1
    Volvo  0.75     3

2   Audi   0.33     1
    Volvo  0.33     1
    VW     0.33     1

I have tried


def get_tag_ratio(x):
    """
    Calculate the ratio for each car
    """
    ratio = x.value_counts(normalize=True)
    return ratio

car_info = df.groupby(["id"])["car"].agg(
    ratio=get_tag_ratio, count="count")

but the result is

          ratio    count
 
id 

1   [0.75,0.25]          4

2   [0.33,0.33,0.33]     3

I have also tried

car_info = df.groupby(["id","car"])["car"].agg(
    ratio=get_tag_ratio, count="count")

which gives

          ratio    count
    
id   car

1   Audi   1.0      1
    Volvo  1.0      3

2   Audi   1.0      1
    Volvo  1.0      1
    VW     1.0      1

which is almost there – unless the value_counts now is applied within each car group and not each id group.

Just parsing "value_counts" doens’t work (also, I don’t know how to get “normalize=True” as an argument to value_counts here)

df.groupby("id")["car"].agg(["value_counts","count"]) #`ValueError: Must pass non-zero number of levels/codes`



Solution

This question is not yet answered, be the first one who answer using the comment. Later the confirmed answer will be published as the solution.

This Question and Answer are collected from stackoverflow and tested by JTuto community, is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?