Issue
This Content is from Stack Overflow. Question asked by Abhinav bharti
newlist = []
for column in new_columns:
count12 = new_df.loc[new_df[col].diff() == 1]
new_df2=new_df2.groupby([‘my_id’,’friend_id’,’family_id’,’colleage_id’]).apply(len)
There is no option is available in pyspark for getting all length of column
How can we achieve this code into pyspark.
Thanks in advance..
Solution
Literally, apply(len)
is just an aggregation function that would count grouped elements from groupby
. You can do the very same thing in basic PySpark syntax
import pyspark.sql.functions as F
(df
.groupBy('my_id','friend_id','family_id','colleage_id')
.agg(F.count('*'))
.show()
)
This Question was asked in StackOverflow by Abhinav bharti and Answered by pltc It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.