Pandas is a popular Python library used for data analysis and manipulation. It provides various functions to resample, shift, or lag timeseries data, allowing users to manipulate the data along the time index.
Resampling changes the frequency of observations in a time series by collapsing them into periods or expanding them into higher frequencies. Shifting moves data values by a specified number of periods to create lag/lead relationships. Lagging shifts the data later in time by a set number of periods.
These time series manipulation techniques are invaluable for financial, econometric, and scientific applications. This comprehensive guide will demonstrate how to leverage Pandas for resampling, shifting, and lagging timeseries data using Python.
Resampling Timeseries Data
Resampling changes the frequency of a time series by converting observations from their original intervals to new, periodic intervals.
For example, daily data can be resampled to weekly, monthly, quarterly, or annual frequencies. Lower frequency data like monthly can also be resampled to higher frequencies like daily or hourly.
Pandas provides the .resample()
method to resample timeseries data to a new frequency along the datetime index.
Downsampling to Lower Frequencies
Downsampling reduces the frequency of observations by collapsing them into periods or intervals.
For example, converting daily data to monthly data requires collapsing all daily values that fall in a month to a single monthly value.
Here is an example dataframe with daily datetime index and value columns:
import pandas as pd
data = {'Date': ['2023-01-01','2023-01-02','2023-01-03','2023-01-04'],
'Values': [10, 20, 30, 40]}
df = pd.DataFrame(data, index=pd.to_datetime(data['Date']))
print(df)
Values
Date
2023-01-01 10
2023-01-02 20
2023-01-03 30
2023-01-04 40
To downsample this daily data to monthly frequency, we pass the target frequency 'M'
to .resample()
:
monthly = df.resample('M').sum()
print(monthly)
Values
Date
2023-01-31 100
The default aggregation function is sum()
which adds up the Values. Other aggregations like count(), mean(), max(), min()
can also be applied.
Upsampling to Higher Frequencies
Upsampling increases the frequency of observations by expanding or interpolating values into higher frequency periods.
For example, converting monthly data to daily data requires assigning daily values estimated from the monthly value.
Here is monthly dataframe:
data = {'Date': ['2023-01-01','2023-02-01'],
'Values': [100, 200]}
df = pd.DataFrame(data, index=pd.to_datetime(data['Date']))
print(df)
Values
Date
2023-01-01 100
2023-02-01 200
To upsample to daily frequency, we resample using D
and specify interpolation method:
daily = df.resample('D').interpolate(method='linear')
print(daily.head())
Values
Date
2023-01-01 100.0
2023-01-02 112.5
2023-01-03 125.0
2023-01-04 137.5
2023-01-05 150.0
This linearly interpolates values between the monthly data points to create estimated daily values.
Resampling Period Offsets
The .resample()
period can be offset using the offset
parameter to change the period start point.
For example, to resample weekly starting from Wednesday instead of the default Sunday:
df.resample('W', offset='W-WED').mean()
Common offset aliases like ‘W-WED’ are provided for convenience. Custom offsets can also be used.
Apply Functions Other Than Aggregation
While aggregating the resampled values is most common, Pandas allows applying other functions too.
For example, to add a constant to the resampled values:
df.resample('A').apply(lambda x: x + 100)
The apply()
method will pass each resampled group to the function provided.
Shifting Time Series Data
Shifting moves data values a specified number of periods while keeping the timeseries aligned and indexes unchanged.
The .shift()
method rolls the data values in a series or dataframe to create lags and leads.
For example, to create a 5 day lag:
lags = df['Values'].shift(periods=5)
print(lags.head())
Date
2023-01-01 NaN
2023-01-02 NaN
2023-01-03 NaN
2023-01-04 NaN
2023-01-05 10.0
The values are shifted 5 periods backward, with NaNs filling the gaps.
Conversely, to create a 3 period lead:
leads = df['Values'].shift(periods=-3)
print(leads.head())
Date
2023-01-01 40.0
2023-01-02 NaN
2023-01-03 NaN
2023-01-04 NaN
2023-01-05 NaN
Here the values are shifted forward 3 periods.
We can also shift based on a Timedelta instead of periods:
df['Values'].shift(freq='3D') # Shift values by 3 days
Multivariate shifting of multiple columns is done by passing a dataframe to .shift()
.
Lagging Time Series Data
Lagging is a special case of shifting that rolls the data values back to create lags.
The .lag()
method is provided as a convenience for lagging. It only accepts positive lag periods which shift values backward.
For example, to create a 3 day lag:
lagged = df['Values'].lag(3)
print(lagged.head())
Date
2023-01-01 NaN
2023-01-02 NaN
2023-01-03 NaN
2023-01-04 10.0
2023-01-05 20.0
The parameter auto=True
will automatically lag based on the timeseries frequency.
We can lag multiple columns by passing a dataframe. The axis
parameter can define columns vs rows lagging.
df.lag(periods=1, axis='columns')
This lags each column by 1 period.
Advanced Time Series Manipulations
Some more advanced transformations can be done combining resampling, shifting, and lagging.
Resample then Shift
Resampling to a lower frequency followed by shifting allows spreading out lags across longer periods.
For example, to create a lag across 2 months:
monthly = df.resample('M').mean()
lag2 = monthly.shift(2)
Resample, Lag, then Interpolate
Resampling to higher frequency followed by lagging creates missing values. Interpolating can fill these missing rows.
For example:
daily = df.resample('D').asfreq()
daily_lag = daily.lag(2)
daily_lag = daily_lag.interpolate()
Rolling Window Resampling
Rolling windows can be used on resampled data to analyze trends over fixed periods.
For example, 12 month rolling averages:
df.resample('M').mean().rolling(12).mean()
Conclusion
This guide demonstrated how to leverage Pandas’ powerful timeseries data manipulation capabilities for resampling, shifting, and lagging.
Resampling changes frequency, shifting creates lags/leads, and lagging rolls values back. Combining these expands the range of time series data transformations possible.
These techniques open up many possibilities for financial analysis, feature engineering, forecasting, and more. Resampling, shifting and lagging with Pandas provides flexibility to wrangle temporal data into the required form for unique time series modeling and prediction tasks.