Pivoting data is a common task in data analysis that involves reshaping a table into a desired layout. In Pandas, two very useful functions for pivoting data are pivot()
and pivot_table()
.
The pivot()
function allows you to pivot a table on specific column values. It aggregates the data and pivots the unique values of specified columns into separate columns. Meanwhile, pivot_table()
offers advanced functionalities for pivoting, aggregating, and cross-tabulating data.
In this comprehensive guide, we will explore how to use Pandas’ pivot()
and pivot_table()
for pivoting data in Python.
Table of Contents
Open Table of Contents
Prerequisites
To follow along with the examples, you should have:
- Python 3 installed on your system
- Pandas library imported:
import pandas as pd
- A Pandas DataFrame loaded with sample data:
data = pd.DataFrame({
'Product': ['Phone', 'Tablet', 'TV', 'Tablet', 'TV'],
'Price': [699, 599, 1299, 599, 1299 ],
'Date': ['2021-01-01', '2021-01-05', '2021-02-03', '2021-02-24', '2021-03-01'],
'City': ['Sydney', 'Sydney', 'Melbourne', 'Perth', 'Brisbane']
})
This sample data contains sales information with columns for product, price, date, and city.
Now let’s see how we can reshape this sample data into different layouts using Pandas’ pivoting functions.
Pandas pivot()
Function
The pivot()
function allows pivoting a DataFrame on specific column values. It aggregates the data and pivots unique values of the specified columns into separate columns.
The basic syntax for pivot()
is:
df.pivot(index=None, columns=None, values=None)
Where:
index
(optional) - Column to use as index, default is none.columns
(optional) - Column to use as columns, default is none.values
(optional) - Column to aggregate.
By default, if no arguments are provided, pivot()
will use all the columns as both index and columns.
Let’s see some examples of pivoting the sample data:
Pivot Single Column as Index
Pivot Product
as the index:
data.pivot(index='Product')
This aggregates the data and pivots the unique product values into separate columns:
Price Date City
Product
Phone 699 2021-01-01 Sydney
Tablet 599 2021-01-05 Sydney
Tablet 599 2021-02-24 Perth
TV 1299 2021-02-03 Melbourne
TV 1299 2021-03-01 Brisbane
Pivot on Index and Columns
We can also specify both index and columns for more control over the pivot table layout:
data.pivot(index='Product', columns='City')
This pivots the product into rows and cities into columns:
City Brisbane Sydney Melbourne Perth
Product
Phone NaN 699 NaN NaN
Tablet NaN 599 NaN 599
TV 1299 NaN 1299 NaN
Aggregate Values During Pivot
To aggregate values during the pivot, use the values
argument.
For example, to average the prices:
data.pivot(index='Product', values='Price')
This aggregates the prices into averages for each product:
Price
Product
Phone 699
Tablet 599
TV 1299
We can also perform other aggregations like .sum()
, .min()
, .max()
, .count()
, etc.
Pivoting Multiple Columns
Pivoting more than one column into columns can be done by passing a list of column names to the columns
argument.
data.pivot(index='Product', columns=['City', 'Date'])
This will pivot both City and Date into multiple columns:
City Brisbane Melbourne Perth Sydney
Date 2021-03-01 2021-02-03 2021-02-24 2021-01-01
Product
Phone NaN NaN NaN 699
Tablet NaN NaN 599 599
TV 1299 1299 NaN NaN
As we can see, Pandas pivot()
function provides an easy way to reshape the data by pivoting columns into a desired layout.
Pandas pivot_table()
Function
While pivot()
provides basic pivoting functionality, Pandas pivot_table()
offers advanced capabilities like aggregating, calculating margins, pivoting multiple columns, and more.
The syntax for pivot_table()
is:
pd.pivot_table(
data,
values=None,
index=None,
columns=None,
aggfunc='mean',
fill_value=None,
margins=False,
dropna=True,
margins_name='All'
)
Where:
data
- DataFrame to pivotvalues
- Columns to aggregate.index
- Column(s) to be index.columns
- Column(s) to pivot into columns.aggfunc
- Aggregation function to use. ‘mean’ by default.fill_value
- Value to replace missing values with.margins
- Add row/column margins (subtotals). Default False.dropna
- Drop missing values. Default True.margins_name
- Name prefix for margin values.
Let’s go through some examples of using pivot_table()
on our sample sales data.
Simple Pivot Table
To start with, let’s generate a simple pivot table aggregating Prices by Product:
pd.pivot_table(data, values='Price', index='Product')
This gives us the average Price for each Product:
Price
Product
Phone 699
Tablet 599
TV 1299
Multilevel Pivot Table
We can also create a multilevel pivot table by specifying multiple columns for index
and columns
.
For example:
pd.pivot_table(data, values='Price', index='Product', columns='City')
This will pivot Product into the index and City into the columns:
City Brisbane Melbourne Perth Sydney
Product
Phone NaN NaN NaN 699
Tablet NaN NaN 599 599
TV 1299 1299 NaN NaN
Aggregation Functions
The aggfunc
argument allows specifying different aggregation functions like sum
, min
, max
, etc.
For example, calculating minimum Prices by Product and City:
pd.pivot_table(data, values='Price', index='Product', columns='City', aggfunc='min')
Output:
City Brisbane Melbourne Perth Sydney
Product
Phone NaN NaN NaN 699
Tablet NaN NaN 599 599
TV 1299 1299 NaN NaN
Margins and Subtotals
We can add margins to the pivot table, which provides row/column totals and grand total values.
Set margins=True
:
pd.pivot_table(data, values='Price', index='Product', columns='City', margins=True)
This will add row and column totals:
City Brisbane Melbourne Perth Sydney All
Product
Phone NaN NaN NaN 699 699
Tablet NaN NaN 599 599 599
TV 1299 1299 NaN NaN 1299
All 1299 1299 599 699 3996
The name prefix for margins can also be customized using margins_name
.
Filling Missing Values
Missing values can be filled with a custom value using the fill_value
argument.
Let’s replace NaN with 0:
pd.pivot_table(data, values='Price', index='Product', columns='City', fill_value=0)
This fills all the missing values with 0:
City Brisbane Melbourne Perth Sydney
Product
Phone 0 0 0 699
Tablet 0 0 599 599
TV 1299 1299 0 0
As we can see, Pandas pivot_table()
enables us to generate customizable pivot tables with advanced features like margins, fill values, and multiple aggregation functions.
Practical Examples
Now let’s go through some practical examples of pivoting real-world data using Pandas.
Pivoting Survey Data
Suppose we have a survey data set with columns for Age, Gender, Country, and Salary.
We can load it into a DataFrame:
survey_df = pd.DataFrame({
'Age': [23, 58, 62, 43, 38],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
'Country': ['USA', 'Japan', 'India', 'USA', 'UK'],
'Salary': [50000, 60000, 45000, 80000, 78000]
})
To analyze average salary by age for each country, we can pivot Age into rows and Country into columns:
survey_df.pivot_table(values='Salary', index='Age', columns='Country', aggfunc='mean')
This results in the following pivot table:
Country India Japan UK USA
Age
23 NaN NaN NaN 50000.0
38 NaN NaN 78000.0 NaN
43 NaN NaN NaN 80000.0
58 NaN 60000.0 NaN NaN
62 45000.0 NaN NaN NaN
We can also display salary subtotals for each country and age by adding margins:
survey_df.pivot_table(values='Salary', index='Age', columns='Country',
aggfunc='mean', margins=True)
This will add row and column totals:
Country India Japan UK USA All
Age
23 NaN NaN NaN 50000.0 50000.0
38 NaN NaN 78000.0 NaN 78000.0
43 NaN NaN NaN 80000.0 80000.0
58 NaN 60000.0 NaN NaN 60000.0
62 45000.0 NaN NaN NaN 45000.0
All 45000.0 60000.0 78000.0 65000.0 248000.0
As we can see, pivot_table()
enables insightful pivoting of survey data for analysis.
Pivoting Financial Time Series Data
Pandas pivot tables are extremely useful for time series data analysis. Let’s see an example with financial stock market data.
We’ll load historical stock data for multiple companies into a DataFrame:
stocks_df = pd.DataFrame({
'Company': ['ABC', 'XYZ', 'ABC', 'XYZ', 'ABC', 'XYZ'],
'Date': ['2019-01-01', '2019-01-01', '2019-01-02', '2019-01-02', '2019-01-03', '2019-01-03'],
'Price': [10, 15, 12, 8, 18, 12]
})
To analyze the average price over time for each company, we can pivot the Date into columns:
stocks_df.pivot_table(values='Price', index='Company', columns='Date', aggfunc='mean')
Output:
Date 2019-01-01 2019-01-02 2019-01-03
Company
ABC 10 12 18
XYZ 15 8 12
Adding margins provides totals for each date across companies:
stocks_df.pivot_table(values='Price', index='Company', columns='Date',
aggfunc='mean', margins=True)
This adds a totals column:
Date 2019-01-01 2019-01-02 2019-01-03 All
Company
ABC 10 12 18 13.33
XYZ 15 8 12 11.67
All 12.5 10.0 15.0 12.5
As we can see, pivot tables enable powerful reshaping and analysis of financial time series data using Pandas.
Reshaping with Pivot
While pivot_table()
is great for aggregating data, the pivot()
function allows pivoting the raw data into different layouts.
Let’s go back to our original sales data DataFrame:
data = pd.DataFrame({
'Product': ['Phone', 'Tablet', 'TV', 'Tablet', 'TV'],
'Price': [699, 599, 1299, 599, 1299 ],
'Date': ['2021-01-01', '2021-01-05', '2021-02-03', '2021-02-24', '2021-03-01'],
'City': ['Sydney', 'Sydney', 'Melbourne', 'Perth', 'Brisbane']
})
We can reshape it into a time series layout by pivoting the Date into columns:
data.pivot(index='Product', columns='Date', values='Price')
Output:
Date 2021-01-01 2021-01-05 2021-02-03 2021-02-24 2021-03-01
Product
Phone 699.0 NaN NaN NaN NaN
Tablet NaN 599.0 NaN 599.0 NaN
TV NaN NaN 1299.0 NaN 1299.0
This unpivots the dates into multiple columns while keeping the price values intact.
We can also pivot multiple columns like City and Date:
data.pivot(index='Product', columns=['City', 'Date'])
Resulting in:
City Brisbane ... Sydney
Date 2021-03-01 2021-02-03 2021-02-24 2021-01-01 2021-01-05
Product ...
Phone NaN NaN NaN 699 NaN
Tablet NaN NaN NaN NaN 599
TV 1299 1299 NaN NaN NaN
This pivots City and Date into the columns creating a customized reshaping of the data.
As we can see, pivot()
gives us a handy and flexible way to reshape raw DataFrames into different layouts.
Summary
In summary, Pandas provides two versatile functions for pivoting data:
pivot()
provides a basic pivot table by pivoting column values into columns.pivot_table()
enables creating advanced pivot tables with aggregation, margins etc.
Key takeaways:
- Use
index
andcolumns
arguments inpivot()
to specify pivot rows and columns. - Aggregate values during pivot using
values
argument. pivot_table()
allows aggregation viaaggfunc
, adding margins, custom fill values etc.- Pivot tables can reshape complex datasets for easier analysis.
- Use
pivot()
to reshape raw data into different layouts.
With Pandas’ pivoting capabilities, we can flexibly transform data into actionable insights. Both pivot()
and pivot_table()
are invaluable tools for data exploration and analysis in Python.