[SOLVED] How to plot a stacked bar chart with multiple variables with seaborn – Stack Overflow

Issue

This Content is from Stack Overflow. Question asked by sumitra sivaprakasam

I have a dataset as shown below:

Season  Phylum          Assigned    Yield
1   Acidobacteria       157363      High
1   Ignavibacteriae     15158       Low
1   Gemmatimonadetes    16408       High
2   Actinobacteria      143507      High
2   Chloroflexi         252391      Low
3   Cyanobacteria       172041      High
3   Firmicutes          74769       High
3   Acidobacteria       222991      Low
3   Bacteroidetes       280246      Low

I used this code, however, failed to achieve the plot i wanted

import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter, MultipleLocator
import seaborn as sns
import pandas as pd

df = pd.read_csv("./bacterial_phylum_abundance_root_allseasons.csv",sep='t')
print(df)

sns.set_style('whitegrid')
g = sns.displot(data=df, x='Yield', hue='Phylum', col='Season', multiple='fill', shrink=0.7, palette='turbo')
g.set(xlabel='', ylabel='')
g.axes[0, 0].yaxis.set_major_locator(MultipleLocator(.1))
g.axes[0, 0].yaxis.set_major_formatter(PercentFormatter(1))
g.axes[0, 0].set_xlim(-.6, 1.6)
sns.despine(left=True)
plt.subplots_adjust(wspace=0)
plt.show()

I would like to make a stacked bar chart that looks something like this which included all season (1,2,3):
enter image description here

Did really appreciate it if someone could help me out.

Thank you in advance



Solution

The reason you are seeing just 50% for each bar is because the height is being dictated by the number of rows in each case. So, it is either 100% (single) or 50% (two entries). One way to get around it is to use stat='probability in the displot. However, your Assigned value is in a column and probability looks for the number of rows. So, I have used repeat() to create the number of rows of same info based on the number in Assigned. Not sure if this is the most efficient way, but should give you the result you need. Data is what you provided. See if this works…

import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter, MultipleLocator
import seaborn as sns
import pandas as pd
sns.set_style('whitegrid')

df1=df.loc[df.index.repeat(df.Assigned)]  ##Used repeate to create rows as value in repeat

g = sns.displot(data=df1, x='Yield', hue='Phylum', col='Season', multiple='fill', shrink=0.7, stat='probability', palette='turbo') ##stat is probability
g.set(xlabel='', ylabel='')
g.axes[0, 0].yaxis.set_major_locator(MultipleLocator(.1))
g.axes[0, 0].yaxis.set_major_formatter(PercentFormatter(1))
g.axes[0, 0].set_xlim(-.6, 1.6)
sns.despine(left=True)
plt.subplots_adjust(wspace=0)
plt.show()

Plot

enter image description here

Improvement/change

As you mentioned that the time taken is long, I made some adjustment to the code. Instead of repeat() which will add as many rows as the value of Assigned column, the percentages of each are calculated before repeat() is applied. So, there should just be a few hundred rows. The same result is received, although the precision might be at a 1% level. But think that is ok. See if this works.

df=pd.read_excel('myinput.xlsx', 'Sheet91')
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter, MultipleLocator
import seaborn as sns
import pandas as pd
print(df)
sns.set_style('whitegrid')
#df1=df.loc[df.index.repeat(df.Assigned)]

trans = df.groupby(['Season', 'Yield'])['Assigned'].transform('sum')
df['perc'] = round(df['Assigned']/trans * 100)
df1=df.loc[df.index.repeat(df.perc)]

#g = sns.displot(data=df, x='Yield', hue='Phylum', col='Season', multiple='fill', shrink=0.7, palette='turbo')
g = sns.displot(data=df1, x='Yield', hue='Phylum', col='Season', multiple='fill', shrink=0.7, stat='probability', palette='turbo')
g.set(xlabel='', ylabel='')
g.axes[0, 0].yaxis.set_major_locator(MultipleLocator(.1))
g.axes[0, 0].yaxis.set_major_formatter(PercentFormatter(1))
g.axes[0, 0].set_xlim(-.6, 1.6)
sns.despine(left=True)
plt.subplots_adjust(wspace=0)
plt.show()

enter image description here


This Question was asked in StackOverflow by sumitra sivaprakasam and Answered by Redox It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?