Issue
This Content is from Stack Overflow. Question asked by 13sen1
I have a daily time series DataFrame spanning many years. I want to create a naive forecast of ‘this day last year’ as a forecast for the next year, however I cannot just use something like df.shift(365)
as I want to maintain the working days versus weekends. So I want to select the day that matches the year-1, weekday and week number, then use this as a forecast going forward (or something equally good).
I was debating whether to create two columns for weekday and week number, then create a pivot_table by year, then somehow shift the years along. But I went for this approach below (which does not really work, and is really slow).
def forecast(row):
# set the year lookup to the previous year, means no values for the first year
year = row.name.year - 1
# match the weekday as not great to compare a working day to a weekend
weekday = row.name.weekday()
# match the week number too, as not great comparing e.g. Feb to May
week = row.name.isocalendar().week
return df.loc[(df.index.year == year) &
(df.index.weekday == weekday) &
(df.index.isocalendar().week == week)]
df['prediction'] = df.apply(lambda row: forecast(row), axis=1)
Below is a clip of my DataFrame, I would want a new column called ‘Prediction’ that would be nan for the first year, but from then on equal to the previous year’s date with the same week number and workday.
Day Value
2012-04-01 9.56
2012-04-02 9.37
2012-04-03 9.72
2012-04-04 11.27
2012-04-05 12.56
...
2022-03-27 10.52
2022-03-28 10.32
2022-03-29 11.19
2022-03-30 12.66
2022-03-31 15.34
How can I do this? In something like Excel, you could create columns for each parameter, then simply select the year, week number and working day you are after and drag down the formula. I am guessing there is a similar way using Pandas?
Solution
Maybe as follows.
- Use
datetime.date.isocalendar
to get year, week, and day. - Feed the result to
datetime.date.fromisocalendar
, but minus 1 year. - Finally, use
pd.Series.map
to retrieve the previous-year values.
Setup
import pandas as pd
import numpy as np
from datetime import datetime
date_rng = pd.date_range('2012-01-01','2014-12-31', freq='D')
data = np.random.uniform(low=1, high=10, size=(len(date_rng),))
df = pd.DataFrame(data,
index=pd.Index(date_rng, name='Day'),
columns=['Value'])
Code
df['Prediction'] = df.index.isocalendar().apply(
lambda x: datetime.fromisocalendar(x.year-1, x.week, x.day),
axis=1).map(df.Value)
print(df[df.Prediction.notna()].head())
Value Prediction
Day
2012-12-30 8.103599 9.405936
2012-12-31 1.846680 1.559113
2013-01-01 9.155027 6.992394
2013-01-02 8.264289 3.495970
2013-01-03 1.571457 6.596947
It may seem odd that the notna
data starts at 2012-12-30
rather than 2013-01-01
, but this makes sense if you look at the isocalendar
results:
df.index.isocalendar().loc['2012-12-30':'2012-12-31']
year week day
Day
2012-12-30 2012 52 7
2012-12-31 2013 1 1
df.index.isocalendar().loc['2012-01-01':'2012-01-02']
year week day
Day
2012-01-01 2011 52 7
2012-01-02 2012 1 1
# So that:
np.array_equal(df.loc['2012-01-01':'2012-01-02', 'Value'].to_numpy(),
df.loc['2012-12-30':'2012-12-31', 'Prediction'].to_numpy())
# True
This Question was asked in StackOverflow by 13sen1 and Answered by ouroboros1 It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.