Skip to content

Generating Time Series Data with Pandas in Python

Updated: at 05:23 AM

Time series data is ubiquitous across many domains, from finance and economics to science and engineering. As a ordered sequence of data points indexed by time, it allows us to model and analyze trends and patterns over time. The Python ecosystem offers many options for working with time series, but Pandas is one of the most popular and powerful.

Pandas provides versatile data structures like Series, DataFrames, and Panel to store time series data, plus a variety of time series-related functionality for handling dates, times, time zones, frequencies, data resampling, moving and rolling calculations, and more. This comprehensive guide will demonstrate how to generate time series data in Python using Pandas, with code examples and explanations for key concepts and techniques.

We will cover:

Table of Contents

Open Table of Contents

Creating DatetimeIndexes

The foundation of time series data in Pandas is the DatetimeIndex, which contains the timestamps for the data and enables time-based indexing, calculations, and manipulations. Here are some ways to create a DatetimeIndex:

import pandas as pd

# From a list of datetime strings
dates = ['2023-01-01', '2023-01-02', '2023-01-03']
dti = pd.DatetimeIndex(dates)

# From a start date, end date, and frequency
start = '2023-01-01'
end = '2023-01-10'
dti = pd.date_range(start, end, freq='D')

# From a start timestamp, periods, and frequency
start_ts = pd.Timestamp('2023-01-01')
periods = 365
dti = pd.date_range(start=start_ts, periods=periods, freq='D')

The key parameters are start date, end date, frequency (e.g. ‘D’ for daily), and number of periods. This generates a regular sequence of dates that serve as the index.

We can then create a DataFrame with this DatetimeIndex:

df = pd.DataFrame(index=dti)

Now df is ready to hold time series data indexed by the dates in dti.

Generating Time Series Ranges

Pandas provides great flexibility for generating sequences of dates for time series indexing. We can specify combinations of start, end, periods, and frequency to control the date range generation.

Daily Frequency

Daily data is very common. To generate a simple daily date range:

start = '2023-01-01'
end = '2023-01-31'
dti = pd.date_range(start, end)

This will generate a DatetimeIndex with one timestamp per day between the start and end dates (inclusive).

We can also specify a certain number of periods:

start = '2023-01-01'
periods = 365
dti = pd.date_range(start=start, periods=periods, freq='D')

This will create 365 daily timestamps starting from January 1, 2023.

Monthly Frequency

For monthly data, we just change the frequency to ‘M’:

start = '2023-01-01'
end = '2025-12-01'
dti = pd.date_range(start, end, freq='M')

This generates a monthly DatetimeIndex from January 2023 to December 2025.

We can also do:

start = '2023-01-01'
periods = 24
dti = pd.date_range(start=start, periods=periods, freq='M')

To create 24 months of timestamps.

Quarterly, Weekly, Hourly Frequencies

These are just as easy. Some examples:

# Quarterly
start = '2023-Q1'
end = '2025-Q4'
qti = pd.period_range(start, end, freq='Q')

# Weekly
start = '2023-01-01'
end = '2023-03-01'
wti = pd.date_range(start, end, freq='W')

# Hourly
start = '2023-01-01'
end = '2023-01-03'
hti = pd.date_range(start, end, freq='H')

We can mix frequencies when converting between daily, hourly, minutely timestamps as well.

Business Days

For business days, we use ‘B’ as the frequency:

start = '2023-01-01'
end = '2023-01-31'
dti = pd.date_range(start, end, freq='B')

This will skip weekends.

We can also use the holidays parameter along with a list of dates to exclude certain holidays:

holidays = ['2023-01-02', '2023-01-16']
dti = pd.date_range(start, end, freq='B', holidays=holidays)

Now it will exclude those holiday dates as well.

Custom Frequencies

For more customized frequencies, we can use the format:

X[minutes/hours/days/weeks/months/years]

Some examples:

# Every 4 hours
dti = pd.date_range('2023-01-01', periods=10, freq='4H')

# Every 2 weeks
dti = pd.date_range('2023-01-01', periods=52, freq='2W')

# Every 3rd day
dti = pd.date_range('2023-01-01', periods=100, freq='3D')

This provides very flexible control over generating complex time series date sequences.

Resampling and Frequency Conversion

Pandas makes it easy to resample or convert a time series from one frequency to another using the Series.resample() and DataFrame.resample() methods.

For example, we can convert daily data to weekly:

ts = pd.Series(range(10), index=pd.date_range('20230101', periods=10))

ts.resample('W').mean()

This aggregates the daily data to weekly means.

We can also resample hourly data to daily:

df = pd.DataFrame(range(24), index=pd.date_range('20230101', periods=24, freq='H'))

df.resample('D').max()

This takes the max value per day. Other aggregation options like min, median, sum, ohlc (for OHLC bars) are possible too.

Downsampling (higher to lower frequency) aggregates the data, while upsampling (lower to higher frequency) generates missing values which can be interpolated.

Resampling is very powerful for wrangling time series between frequencies like minute to hourly, hourly to daily etc.

Handling Time Zones

Pandas does sophisticated time zone handling under the hood. By default, timestamps are timezone-naive:

ts = pd.Series(range(5), index=pd.date_range('1/1/2023', periods=5))
print(ts.index.tz)
# None

But we can localize the timestamps to a timezone like ‘US/Eastern’:

ts.index = ts.index.tz_localize('US/Eastern')
print(ts.index.tz)
# US/Eastern

Timestamps can also be converted between time zones:

ts.index = ts.index.tz_convert('US/Pacific')

Operations like resampling and frequency conversion will take time zones into account. Pandas also understands daylight savings time transitions. Robust time zone support makes Pandas great for working with global data.

Creating Synthetic Data

In addition to indexing, Pandas can generate synthetic time series data programmatically for modeling and simulations. Some examples:

Random Walk

A basic stochastic time series model is a random walk:

df = pd.DataFrame(index=pd.date_range('20230101', periods=365))
df['Value'] = df.index + np.random.normal(scale=5, size=365)

This simulates a noisy random walk process over 365 days. More sophisticated variants like geometric Brownian motion are possible too.

Seasonal Data

Many time series have seasonal cycles, like hourly, daily, weekly, monthly, or annual seasons. We can add seasonal components:

N = 365
df = pd.DataFrame(index=pd.date_range('20230101', periods=N))
df['Hourly'] = 10 + np.sin(df.index.hour * (2*np.pi/24))
df['Daily'] = 20 + np.sin(df.index.dayofyear * (2*np.pi/N))
df['Weekly'] = 5 + np.cos(df.index.dayofweek * (2*np.pi/7))

This creates cycles by hour, day, and week. Seasons of any frequency can be modeled.

Long term trends and cyclical patterns are common as well:

N = 365
df = pd.DataFrame(index=pd.date_range('20230101', periods=N))
df['Trend'] = 10 + df.index.dayofyear/365 * 30
df['Cycle'] = 15 + np.sin(df.index.dayofyear * (2*np.pi/180))

Here we add a linear trend and a 180-day cycle.

By combining seasonal, trend, cyclical, and noise components, we can simulate realistic time series with Pandas.

Adding Noise

To make synthetic data more realistic, we can add random noise:

N = 365
df = pd.DataFrame(np.sin(df.index.dayofyear * (2*np.pi/180)),
                  index=pd.date_range('20230101', periods=N))

noise = np.random.normal(scale=0.5, size=N)
df['Noisy'] = df['Value'] + noise

This takes a clean sine wave and adds Gaussian white noise. Other noise models like ARIMA or GARCH can also be simulated.

The amount of noise controls how much randomness versus the signal. This technique is useful for simulating real-world uncertainties.

Modeling Holidays and Special Dates

Special dates like holidays often have unique effects in time series like energy consumption or e-commerce. We can add these manually:

N = 365
df = pd.DataFrame(index=pd.date_range('20230101', periods=N))

df.loc['2023-01-01', 'Holiday'] = 1 # New Year's Day
df.loc['2023-05-28', 'Holiday'] = 1 # Memorial Day
df.loc['2023-07-04', 'Holiday'] = 1 # Independence Day
# ...

This flags certain dates as holidays in the DataFrame. We can then add holiday effects:

df['Value'] = 10 + np.sin(df.index.dayofyear * (2*np.pi/180))

# Add a holiday effect
df.loc[df['Holiday'] == 1, 'Value'] *= 1.5

Now those dates will have higher values. Custom logic can be added for any special dates.

Financial Data

Pandas is popular for modeling financial time series like stock prices. Some ways to generate sample financial data:

Random Walk with Drift

Simulate a stock price S&P 500-like random walk with a drift upwards:

N = 252 # Trading days
df = pd.DataFrame(index=pd.bdate_range('20230101', periods=N))
df['Price'] = 100 + np.cumsum(np.random.normal(0.02, 0.1, N)) + 15

This builds in a small uptrend over time.

OHLC Bars

Candlestick charts are based on OHLC (Open, High, Low, Close) data for each time period:

N = 252
df = pd.DataFrame(index=pd.bdate_range('20230101', periods=N))

r = np.random.normal(0.01, 0.1, N) # Random returns
df['Open'] = 100 + np.cumsum(r)
df['High'] = df['Open'] + np.abs(np.random.normal(0.1, 0.05, N))
df['Low'] = df['Open'] - np.abs(np.random.normal(0.1, 0.05, N))
df['Close'] = df['Open'] + r

Now we can plot candlestick charts, calculate technical indicators, etc.

Pandas offers many options for modeling both simple and sophisticated financial time series with trends, volatility clustering, and more.

Conclusion

This guide covered key techniques for generating time series data in Pandas, including:

With these building blocks, you should feel empowered to wrangle time series data for analysis across domains like finance, economics, science, and more. Pandas provides a versatile set of tools to handle many common time series tasks. Refer to the Python documentation and Pandas user guide to continue mastering time series data manipulation.