[SOLVED] Interpolated values for specific missing indices of DataFrame or Series

Issue

This Content is from Stack Overflow. Question asked by Grismar

With a dataframe like this:

import pandas as pd

df = pd.DataFrame([
    {'key':  1, 'value': 0.4},
    {'key':  4, 'value': 0.5},
    {'key':  6, 'value': 0.7},
    {'key': 10, 'value': 1.3},
    {'key': 11, 'value': 1.4},
    {'key': 13, 'value': 1.1},
])
df.set_index('key', inplace=True)

I’d like to extract values that are either in the dataframe, or should be interpolated from existing values.

I’m aware of DataFrame.interpolate() and it’s perfect for quickly computing interpolated values for indices with NaN values. So, an approach could be to add all the indices that aren’t already in the index, sort the dataframe by index, interpolate and then extract the values again. Something like:

import numpy as np

new_rows = pd.DataFrame([
    {'key': index, 'value': np.nan} for index in indices if index not in df.index
])
new_rows.set_index('key', inplace=True)
result = df.append(new_rows).sort_index().interpolate(method='spline', order=2)

print(result['value'][indices])

Result:

key
3     0.529559
6     0.700000
9     1.073190
12    1.252086
15    1.369036
Name: value, dtype: float64

However, the whole process of creating an additional dataframe, appending it to the original, sorting by index, calling .interpolate() on the whole result and then extracting the required values seems to be a lot more complication than what I’d expected to find.

Something like:

# fictional, doesn't exist:
result = df.interpolated(indices)  # a DataFrame with only the rows for given indices, interpolated as needed
print(result['value'])

Or:

# fictional, doesn't exist:
result = df['value'].interpolated(indices)  # perhaps only on a Series
print(result)

Am I missing something obvious and is similar functionality actually available? Or is my approach above actually close to what the best way to do it would be?



Solution

You can use scipy’s interp1d:

from scipy.interpolate import interp1d

interp = interp1d(df.index, df, axis=0)

interp([3,6,9])

Output (I duplicated the value column):

array([[0.46666667, 0.46666667],
       [0.7       , 0.7       ],
       [1.15      , 1.15      ]])


This Question was asked in StackOverflow by Grismar and Answered by Quang Hoang It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?