Get count_values(normalize=True) result for each value, in each group in pandas GroupByDataframe


This Content is from Stack Overflow. Question asked by CutePoison

Say I have a dataframe

import pandas as pd
df = pd.DataFrame({"id":[1,1,1,1,2,2,2],

and I want to get the count-ratio for each car within each id aswell as the count i.e the resulting dataframe would be

          ratio    count

1   Audi   0.25     1
    Volvo  0.75     3

2   Audi   0.33     1
    Volvo  0.33     1
    VW     0.33     1

I have tried

def get_tag_ratio(x):
    Calculate the ratio for each car
    ratio = x.value_counts(normalize=True)
    return ratio

car_info = df.groupby(["id"])["car"].agg(
    ratio=get_tag_ratio, count="count")

but the result is

          ratio    count

1   [0.75,0.25]          4

2   [0.33,0.33,0.33]     3

I have also tried

car_info = df.groupby(["id","car"])["car"].agg(
    ratio=get_tag_ratio, count="count")

which gives

          ratio    count
id   car

1   Audi   1.0      1
    Volvo  1.0      3

2   Audi   1.0      1
    Volvo  1.0      1
    VW     1.0      1

which is almost there – unless the value_counts now is applied within each car group and not each id group.

Just parsing "value_counts" doens’t work (also, I don’t know how to get “normalize=True” as an argument to value_counts here)

df.groupby("id")["car"].agg(["value_counts","count"]) #`ValueError: Must pass non-zero number of levels/codes`


This question is not yet answered, be the first one who answer using the comment. Later the confirmed answer will be published as the solution.

This Question and Answer are collected from stackoverflow and tested by JTuto community, is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?