Issue
This Content is from Stack Overflow. Question asked by TheSarfaraz
Let’s say I have an Employees Table and yearly survey filled by each person.
I have to transform transactional data into prediction data year wise.
Available Data:
E_ID | TestYear | DateOfBirth |
---|---|---|
1 | 2010 | 1947-01-01 |
1 | 2011 | 1947-01-01 |
1 | 2012 | 1947-01-01 |
2 | 2010 | 1990-01-01 |
3 | 2011 | 1999-01-01 |
4 | 2011 | 1991-01-01 |
4 | 2012 | 1991-01-01 |
5 | 2010 | 1989-01-01 |
5 | 2011 | 1989-01-01 |
5 | 2012 | 1989-01-01 |
5 | 2013 | 1989-01-01 |
DataFrame I need:
E_ID | Year | Age |
---|---|---|
1 | 2010 | 63 |
1 | 2011 | 64 |
1 | 2012 | 65 |
2 | 2010 | 20 |
2 | 2011 | 21 |
2 | 2012 | 22 |
3 | 2010 | 11 |
3 | 2011 | 12 |
3 | 2012 | 13 |
4 | 2010 | 19 |
4 | 2011 | 20 |
4 | 2012 | 21 |
5 | 2010 | 21 |
5 | 2011 | 22 |
5 | 2012 | 23 |
In the new df I need all employees, for all 3 years 2010, 2011, 2022 and their relevant ages in the year 2010, 2011, 2022 respectively.
How to achieve this? Since in the transactional data, I have records for some employees for some years and not for other years.
Solution
Since your employer Id E_ID
is unique and the date of birth DateOfBirth
is also unique you can groupby the employer id and get the date of birth.
For the aggregation functions in the TestYear
you include a list with the years you want to extract the age and and in DateOfBirth
you can aggregate with a list, since the values of the list are the same (identic date of birth) you get the first entry:
df = df.groupby('E_ID').agg({"TestYear": lambda x: [2010, 2011, 2012],
'DateOfBirth': lambda x: list(x)[0]}).explode("TestYear")
df['DateOfBirth'] = pd.to_datetime(df['DateOfBirth'])
df['Age'] = df['TestYear'] - df['DateOfBirth'].dt.year
output
TestYear DateOfBirth Age
E_ID
1 2010 1947-01-01 63
1 2011 1947-01-01 64
1 2012 1947-01-01 65
2 2010 1990-01-01 20
2 2011 1990-01-01 21
2 2012 1990-01-01 22
3 2010 1999-01-01 11
3 2011 1999-01-01 12
3 2012 1999-01-01 13
4 2010 1991-01-01 19
4 2011 1991-01-01 20
4 2012 1991-01-01 21
5 2010 1989-01-01 21
5 2011 1989-01-01 22
5 2012 1989-01-01 23
This Question was asked in StackOverflow by TheSarfaraz and Answered by Lucas M. Uriarte It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.