[SOLVED] mask duplicate entries while merging in pandas

Issue

This Content is from Stack Overflow. Question asked by Roshankumar

Dataframe 1 :

id    status   
A     Pass          
A     P_Pass       
A     C_Pass 
B     Fail
B     A_Fail

Dataframe 2 :

id    Category     group
A        pxe         1
B        fxe         2

After merging the Dataframe 2 on Dataframe 1 with left join, final Dataframe becomes :

id    status    Category   group
A     Pass          pxe      1
A     P_Pass        pxe      1
A     C_Pass        pxe      1
B     Fail          fxe      2
B     A_Fail        fxe      2

Expected Dataframe is :

id    status    Category   group
A     Pass          pxe      1
A     P_Pass         -       -
A     C_Pass         -       -
B     Fail          fxe      2
B     A_Fail         -       -

i want to mask the duplicate entries and make it appear only once

code used ,

df_final = pd.merge(df_1 , df_2 , on = 'id' , how = 'left')



Solution

you use np.where and cumcount after merging:

#df_final
    id  status  Category    group
0   A   Pass    pxe         1
1   A   P_Pass  pxe         1
2   A   C_Pass  pxe         1
3   B   Fail    fxe         2
4   B   A_Fail  fxe         2

df_final['Category'] = np.where(df_final.groupby('Category').cumcount().eq(0), df_final['Category'], '-')
df_final['group'] = np.where(df_final.groupby('group').cumcount().eq(0), df_final['group'], '-')

#-->>
#df_final
    id  status  Category    group
0   A   Pass    pxe         1
1   A   P_Pass  -           -
2   A   C_Pass  -           -
3   B   Fail    fxe         2
4   B   A_Fail  -           -


This Question was asked in StackOverflow by Roshankumar and Answered by khaled koubaa It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?