Pandas is a popular Python library used for data analysis and manipulation. It provides various methods for transforming time series data to uncover trends, patterns, and insights. Two commonly used transformation techniques in Pandas are rolling and expanding transformations.
Rolling transformations perform an aggregation over a fixed lookback window, allowing analysis of the data in relation to its recent history. In contrast, expanding transformations cumulatively aggregate values from the start of the series up to the current point.
By the end, you will have a solid understanding of when and how to apply rolling and expanding transformations for time series data analysis using Python and Pandas.
Table of Contents
Open Table of Contents
Rolling vs. Expanding Transforms
Rolling Window
A rolling window transformation performs a calculation over a fixed lookback window or frame. The window “rolls” through the time series, computing the statistic over the range of the rolling window.
For example, a 5-day rolling window would aggregate the data from the past 5 days at each point. As the window moves forward, the oldest data point is dropped and the newest is added.
Rolling windows are useful for smoothing time series data, identifying trends, and analyzing local patterns. The fixed lookback allows relating the data to its recent historical context.
Expanding Window
In contrast, an expanding window grows cumulatively - starting at the beginning of the time series and increasing in size until reaching the current data point.
At each point, the transform aggregates all data from the start of the series up to that point. The window continues expanding forward until the end, including all previous data.
Expanding transforms are useful for accumulating historical information and identifying long-term trends. Each data point incorporates more historical context through the ever-growing window.
Comparison
Rolling windows analyze a fixed, consistent window backward from each point. Expanding windows incorporate all history from the start through each point.
Rolling suits local, short-term analysis while expanding suits long-term, cumulative analysis. The fixed vs. ever-growing window size leads to these distinct use cases.
Rolling Window Functions
Pandas provides several functions to apply transformations over a rolling window. Let’s explore key examples.
We’ll start by importing Pandas and creating a sample DataFrame:
import pandas as pd
data = pd.DataFrame({'Value': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]})
Rolling
The DataFrame.rolling()
method generates a Rolling object that supports various window aggregation methods.
By default, the window spans the full length of the series. Specify the window
parameter to set a fixed rolling window size.
Here we calculate the 3-day rolling mean:
data.rolling(window=3).mean()
Value
0 NaN
1 NaN
2 2.0
3 3.0
4 4.0
5 5.0
6 6.0
7 7.0
8 8.0
9 9.0
The first two values are NaN since the window excludes them. At each point, the mean aggregates over the window.
We can visualize rolling transforms using DataFrame.plot()
:
data.rolling(window=5).mean().plot()
This plots the 5-period rolling mean, clearly showing the smoothing effect of the rolling window aggregation.
Common Rolling Window Methods
Useful rolling window methods include:
mean()
- Average value in windowstd()
- Standard deviation in windowmin()
- Minimum value in windowmax()
- Maximum value in windowsum()
- Sum of values in windowcount()
- Count of observations in windowquantile()
- Quantile value of window
These aggregate over the fixed rolling window at each point in the series.
Rolling Apply
For custom aggregation functions, use rolling.apply()
and pass a function.
Here we calculate the interquartile range (IQR) over a 5-day window:
def iqr(series):
q1 = series.quantile(0.25)
q3 = series.quantile(0.75)
return q3 - q1
data.rolling(window=5).apply(iqr)
The function computes the IQR of each rolling window. .apply()
is flexible for statistics beyond the built-ins.
Rolling Regression
Rolling linear regression fits a regression model on each window.
We can implement it manually using .apply()
:
from scipy import stats
def rolling_regression(series):
X = series.index.to_series()
X = sm.add_constant(X)
model = sm.OLS(series, X).fit()
return model.params[1]
data.rolling(window=5).apply(rolling_regression)
Here we fit a simple linear regression on each 5-day window. Pandas has optimizations for rolling linear regressions - see the user guide for details.
Expanding Window Functions
Expanding windows aggregate from the start of the series through the current point. Let’s demonstrate some examples.
Reuse our sample DataFrame:
data = pd.DataFrame({'Value': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]})
Expanding
The DataFrame.expanding()
method generates an Expanding object for transform methods.
By default, it uses an ever-growing window including all data from the start through each point.
Here we calculate the expanding mean:
data.expanding().mean()
Value
0 1.0
1 1.5
2 2.0
3 2.5
4 3.0
5 3.5
6 4.0
7 4.5
8 5.0
9 5.5
At each point, all values from the start are aggregated cumulatively. The window expands forward rather than rolling.
Common Expanding Methods
Useful expanding window methods match the rolling versions:
mean()
- Cumulative averagestd()
- Cumulative standard deviationmin()
- Cumulative minimummax()
- Cumulative maximumsum()
- Cumulative sumcount()
- Cumulative countquantile()
- Cumulative quantile
These aggregate from the start through each point.
Expanding Apply
We can also use .apply()
for custom expanding functions.
Here we find the expanding interquartile range:
data.expanding().apply(iqr)
The iqr
function from earlier is applied cumulatively over the expanding window.
Expanding Regression
Cumulative expanding linear regression aggregates all data from the start at each point:
data.expanding().apply(rolling_regression)
The same rolling_regression
function fits a regression for the expanding window.
Resampling vs. Rolling vs. Expanding
Rolling and expanding transforms differ from Pandas’ resample and groupby operations. Let’s compare:
Resampling aggregates data into discrete periods like days, months, years etc. Useful for cleaning time series data or changing frequencies. Operates independently on each period.
Rolling aggregates data over a fixed moving window relative to each point. Useful for smoothing, trends, and local analysis. Window has consistent size.
Expanding cumulatively aggregates data from start to each point. Useful for long-term trends and accumulation. Window constantly grows.
Groupby splits data into groups based on categories. Useful for segmentation and comparisons. Groups are distinct buckets.
Rolling and expanding focus on ordered time series data with a moving window. Resample handles fixed periods. Groupby looks at distinct categories.
Use Cases and Examples
Rolling and expanding transforms have several applications in time series data analysis. Let’s look at some examples.
Smoothing Time Series
Applying a rolling mean “smooths” out short-term fluctuations, leaving the local trend visible:
daily_data.rolling(window=7).mean().plot()
The 7-day rolling mean filters daily spikes to show the weekly trend.
Identifying Local Patterns
Rolling statistics like standard deviation can detect recent volatility:
volatility = daily_returns.rolling(window=21).std() * np.sqrt(252)
volatility.plot()
21-day rolling standard deviation of returns multiplied by the square root of 252 annualizes it to show periods of local volatility.
Cumulative Trends and Changes
Expanding transforms accumulate from the start, useful for long-term analysis:
revenue.expanding().mean().plot()
The expanding mean shows average revenue growth over time.
Time Series Models
Rolling regressions fit local models to analyze how relationships change:
stock_data.rolling(window=90).apply(rolling_regression)
Fits a 90-day linear regression in each window to see how beta changes over time.
Anomaly and Change Detection
Compare rolling statistics to expanding to identify anomalies:
residuals = rolling_mean - expanding_mean
residuals.plot()
Spikes in the difference between rolling and expanding means may indicate anomalies.
These demonstrate common use cases for rolling and expanding analysis on time series.
Common Pitfalls and Best Practices
Here are some tips for avoiding issues when using rolling and expanding transforms:
- Set an appropriate window size - too short loses context, too long is overly smoothed and slow.
- Beware lookahead bias in rolling statistics.
- Avoid spurious expanding regressions with near-collinear data.
- Rolling apply is powerful but computationally intensive.
- Watch for edge effects with partial windows at beginning and end.
- Compare to resample for aggregation over fixed periods.
- Prefer
.median()
over.mean()
if data contains outliers. - Visualize rolling/expanding data along with raw data for perspective.
- Document window size, min periods, aggregation method in code.
Following best practices like these will lead to effective analysis using rolling and expanding transformations.
Conclusion
This guide covered key concepts for rolling and expanding window transformations in Pandas, including:
- Fixed vs growing aggregation windows
- Built-in and custom window functions
- Comparing to resample and groupby
- Use cases like smoothing, cumulative trends, and anomaly detection
- Best practices for avoiding common pitfalls
Rolling and expanding transforms are powerful techniques for time series data analysis. Using appropriate window sizes and aggregation methods reveals trends, patterns, changes, and anomalies over time.
Pandas provides a flexible, expressive API for rolling and expanding calculations. With care, these transforms unlock impactful insights. This guide equips you with knowledge to apply them effectively in your own data science work.