[SOLVED] Create dataframe using groupby of percentage with condition

Issue

This Content is from Stack Overflow. Question asked by Vipin

I want to create a dataframe in pandas to create a seaborn bar plot. My problem is I am not able to groupby the data with string column. If I use value_count() function it, work fine but in dataframe. Below is the detail explaination: –

My data is

Data_Hr = {“Age”: [41,49,37,33,27], “Attrition”: [“Yes”, “No”, “Yes”, “No”, “No”], “ageRange”:[ 40-45 45-50 35-40 30-35 25-30]

Now I want to calculate % of Attrition with “Yes” value groupby “ageRange”. Below is the function I am using but it is not converted into dataframe.

df[df.Attrition == ‘Yes’][‘ageRange’].value_counts()/df[‘ageRange’].value_counts()*100

Or any other method to plot the graph of attrition%.

Thanks is advance



Solution

Your code works fine.
Please note that the last list is without ” and without commas. You should change it to:

Data_Hr = {"Age": [41,49,37,33,27], "Attrition": ["Yes", "No", "Yes", "No", "No"], "ageRange":['40-45', '45-50', '35-40', '30-35', '25-30']}

Then if you want to plot a graph from the results you should format the results as a dataframe:

results = pd.DataFrame(df[df.Attrition == 'Yes']['ageRange'].value_counts()/df['ageRange'].value_counts()*100)
results .dropna(subset=['ageRange'], inplace=True)

And to display using seaborn:

 sns.barplot(data = results, x = 'ageRange', y = results.index, color = 'Blue').set_title('Age Range Count of Attrition')


This Question was asked in StackOverflow by Vipin and Answered by gtomer It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?