[SOLVED] Python, pandas DataFrame, use previous year, matching workday and week number from date as forecast

Issue

This Content is from Stack Overflow. Question asked by 13sen1

I have a daily time series DataFrame spanning many years. I want to create a naive forecast of ‘this day last year’ as a forecast for the next year, however I cannot just use something like df.shift(365) as I want to maintain the working days versus weekends. So I want to select the day that matches the year-1, weekday and week number, then use this as a forecast going forward (or something equally good).

I was debating whether to create two columns for weekday and week number, then create a pivot_table by year, then somehow shift the years along. But I went for this approach below (which does not really work, and is really slow).

def forecast(row):
    # set the year lookup to the previous year, means no values for the first year
    year = row.name.year - 1
    # match the weekday as not great to compare a working day to a weekend
    weekday = row.name.weekday()
    # match the week number too, as not great comparing e.g. Feb to May
    week = row.name.isocalendar().week

    return df.loc[(df.index.year == year) &
                  (df.index.weekday == weekday) &
                  (df.index.isocalendar().week == week)]


df['prediction'] = df.apply(lambda row: forecast(row), axis=1)

Below is a clip of my DataFrame, I would want a new column called ‘Prediction’ that would be nan for the first year, but from then on equal to the previous year’s date with the same week number and workday.

Day         Value
2012-04-01   9.56
2012-04-02   9.37
2012-04-03   9.72
2012-04-04  11.27
2012-04-05  12.56
           ...
2022-03-27  10.52
2022-03-28  10.32
2022-03-29  11.19
2022-03-30  12.66
2022-03-31  15.34

How can I do this? In something like Excel, you could create columns for each parameter, then simply select the year, week number and working day you are after and drag down the formula. I am guessing there is a similar way using Pandas?



Solution

Maybe as follows.

Setup

import pandas as pd
import numpy as np
from datetime import datetime

date_rng = pd.date_range('2012-01-01','2014-12-31', freq='D')
data = np.random.uniform(low=1, high=10, size=(len(date_rng),))

df = pd.DataFrame(data, 
                  index=pd.Index(date_rng, name='Day'), 
                  columns=['Value'])

Code

df['Prediction'] = df.index.isocalendar().apply(
    lambda x: datetime.fromisocalendar(x.year-1, x.week, x.day), 
    axis=1).map(df.Value)

print(df[df.Prediction.notna()].head())

               Value  Prediction
Day                             
2012-12-30  8.103599    9.405936
2012-12-31  1.846680    1.559113
2013-01-01  9.155027    6.992394
2013-01-02  8.264289    3.495970
2013-01-03  1.571457    6.596947

It may seem odd that the notna data starts at 2012-12-30 rather than 2013-01-01, but this makes sense if you look at the isocalendar results:

df.index.isocalendar().loc['2012-12-30':'2012-12-31']

            year  week  day
Day                        
2012-12-30  2012    52    7
2012-12-31  2013     1    1

df.index.isocalendar().loc['2012-01-01':'2012-01-02']

            year  week  day
Day                        
2012-01-01  2011    52    7
2012-01-02  2012     1    1

# So that:
np.array_equal(df.loc['2012-01-01':'2012-01-02', 'Value'].to_numpy(), 
               df.loc['2012-12-30':'2012-12-31', 'Prediction'].to_numpy())
# True


This Question was asked in StackOverflow by 13sen1 and Answered by ouroboros1 It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?