This Content is from Stack Overflow. Question asked by 13sen1
I have a daily time series DataFrame spanning many years. I want to create a naive forecast of ‘this day last year’ as a forecast for the next year, however I cannot just use something like
df.shift(365) as I want to maintain the working days versus weekends. So I want to select the day that matches the year-1, weekday and week number, then use this as a forecast going forward (or something equally good).
I was debating whether to create two columns for weekday and week number, then create a pivot_table by year, then somehow shift the years along. But I went for this approach below (which does not really work, and is really slow).
def forecast(row): # set the year lookup to the previous year, means no values for the first year year = row.name.year - 1 # match the weekday as not great to compare a working day to a weekend weekday = row.name.weekday() # match the week number too, as not great comparing e.g. Feb to May week = row.name.isocalendar().week return df.loc[(df.index.year == year) & (df.index.weekday == weekday) & (df.index.isocalendar().week == week)] df['prediction'] = df.apply(lambda row: forecast(row), axis=1)
Below is a clip of my DataFrame, I would want a new column called ‘Prediction’ that would be nan for the first year, but from then on equal to the previous year’s date with the same week number and workday.
Day Value 2012-04-01 9.56 2012-04-02 9.37 2012-04-03 9.72 2012-04-04 11.27 2012-04-05 12.56 ... 2022-03-27 10.52 2022-03-28 10.32 2022-03-29 11.19 2022-03-30 12.66 2022-03-31 15.34
How can I do this? In something like Excel, you could create columns for each parameter, then simply select the year, week number and working day you are after and drag down the formula. I am guessing there is a similar way using Pandas?
Maybe as follows.
datetime.date.isocalendarto get year, week, and day.
- Feed the result to
datetime.date.fromisocalendar, but minus 1 year.
- Finally, use
pd.Series.mapto retrieve the previous-year values.
import pandas as pd import numpy as np from datetime import datetime date_rng = pd.date_range('2012-01-01','2014-12-31', freq='D') data = np.random.uniform(low=1, high=10, size=(len(date_rng),)) df = pd.DataFrame(data, index=pd.Index(date_rng, name='Day'), columns=['Value'])
df['Prediction'] = df.index.isocalendar().apply( lambda x: datetime.fromisocalendar(x.year-1, x.week, x.day), axis=1).map(df.Value) print(df[df.Prediction.notna()].head()) Value Prediction Day 2012-12-30 8.103599 9.405936 2012-12-31 1.846680 1.559113 2013-01-01 9.155027 6.992394 2013-01-02 8.264289 3.495970 2013-01-03 1.571457 6.596947
It may seem odd that the
notna data starts at
2012-12-30 rather than
2013-01-01, but this makes sense if you look at the
df.index.isocalendar().loc['2012-12-30':'2012-12-31'] year week day Day 2012-12-30 2012 52 7 2012-12-31 2013 1 1 df.index.isocalendar().loc['2012-01-01':'2012-01-02'] year week day Day 2012-01-01 2011 52 7 2012-01-02 2012 1 1 # So that: np.array_equal(df.loc['2012-01-01':'2012-01-02', 'Value'].to_numpy(), df.loc['2012-12-30':'2012-12-31', 'Prediction'].to_numpy()) # True
This Question was asked in StackOverflow by 13sen1 and Answered by ouroboros1 It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.