Skip to content

Pandas: Pivoting data with pivot(), pivot_table()

Updated: at 02:28 AM

Pivoting data is a common task in data analysis that involves reshaping a table into a desired layout. In Pandas, two very useful functions for pivoting data are pivot() and pivot_table().

The pivot() function allows you to pivot a table on specific column values. It aggregates the data and pivots the unique values of specified columns into separate columns. Meanwhile, pivot_table() offers advanced functionalities for pivoting, aggregating, and cross-tabulating data.

In this comprehensive guide, we will explore how to use Pandas’ pivot() and pivot_table() for pivoting data in Python.

Table of Contents

Open Table of Contents

Prerequisites

To follow along with the examples, you should have:

import pandas as pd
data = pd.DataFrame({
    'Product': ['Phone', 'Tablet', 'TV', 'Tablet', 'TV'],
    'Price': [699, 599, 1299, 599, 1299 ],
    'Date': ['2021-01-01', '2021-01-05', '2021-02-03', '2021-02-24', '2021-03-01'],
    'City': ['Sydney', 'Sydney', 'Melbourne', 'Perth', 'Brisbane']
})

This sample data contains sales information with columns for product, price, date, and city.

Now let’s see how we can reshape this sample data into different layouts using Pandas’ pivoting functions.

Pandas pivot() Function

The pivot() function allows pivoting a DataFrame on specific column values. It aggregates the data and pivots unique values of the specified columns into separate columns.

The basic syntax for pivot() is:

df.pivot(index=None, columns=None, values=None)

Where:

By default, if no arguments are provided, pivot() will use all the columns as both index and columns.

Let’s see some examples of pivoting the sample data:

Pivot Single Column as Index

Pivot Product as the index:

data.pivot(index='Product')

This aggregates the data and pivots the unique product values into separate columns:

   Price       Date         City
Product
Phone   699   2021-01-01    Sydney
Tablet  599   2021-01-05    Sydney
Tablet  599   2021-02-24    Perth
TV     1299  2021-02-03    Melbourne
TV     1299  2021-03-01    Brisbane

Pivot on Index and Columns

We can also specify both index and columns for more control over the pivot table layout:

data.pivot(index='Product', columns='City')

This pivots the product into rows and cities into columns:

City        Brisbane Sydney Melbourne  Perth
Product
Phone        NaN    699      NaN     NaN
Tablet       NaN    599      NaN     599
TV          1299    NaN     1299     NaN

Aggregate Values During Pivot

To aggregate values during the pivot, use the values argument.

For example, to average the prices:

data.pivot(index='Product', values='Price')

This aggregates the prices into averages for each product:

Price
Product
Phone        699
Tablet       599
TV          1299

We can also perform other aggregations like .sum(), .min(), .max(), .count(), etc.

Pivoting Multiple Columns

Pivoting more than one column into columns can be done by passing a list of column names to the columns argument.

data.pivot(index='Product', columns=['City', 'Date'])

This will pivot both City and Date into multiple columns:

City             Brisbane    Melbourne     Perth Sydney
Date         2021-03-01 2021-02-03 2021-02-24 2021-01-01
Product
Phone                NaN         NaN       NaN        699
Tablet               NaN         NaN       599        599
TV                 1299        1299         NaN        NaN

As we can see, Pandas pivot() function provides an easy way to reshape the data by pivoting columns into a desired layout.

Pandas pivot_table() Function

While pivot() provides basic pivoting functionality, Pandas pivot_table() offers advanced capabilities like aggregating, calculating margins, pivoting multiple columns, and more.

The syntax for pivot_table() is:

pd.pivot_table(
  data,
  values=None,
  index=None,
  columns=None,
  aggfunc='mean',
  fill_value=None,
  margins=False,
  dropna=True,
  margins_name='All'
)

Where:

Let’s go through some examples of using pivot_table() on our sample sales data.

Simple Pivot Table

To start with, let’s generate a simple pivot table aggregating Prices by Product:

pd.pivot_table(data, values='Price', index='Product')

This gives us the average Price for each Product:

Price
Product
Phone        699
Tablet       599
TV          1299

Multilevel Pivot Table

We can also create a multilevel pivot table by specifying multiple columns for index and columns.

For example:

pd.pivot_table(data, values='Price', index='Product', columns='City')

This will pivot Product into the index and City into the columns:

City        Brisbane  Melbourne  Perth  Sydney
Product
Phone           NaN        NaN    NaN      699
Tablet          NaN        NaN    599      599
TV            1299       1299    NaN      NaN

Aggregation Functions

The aggfunc argument allows specifying different aggregation functions like sum, min, max, etc.

For example, calculating minimum Prices by Product and City:

pd.pivot_table(data, values='Price', index='Product', columns='City', aggfunc='min')

Output:

City        Brisbane  Melbourne  Perth  Sydney
Product
Phone           NaN        NaN    NaN      699
Tablet          NaN        NaN    599      599
TV            1299       1299    NaN      NaN

Margins and Subtotals

We can add margins to the pivot table, which provides row/column totals and grand total values.

Set margins=True:

pd.pivot_table(data, values='Price', index='Product', columns='City', margins=True)

This will add row and column totals:

City             Brisbane  Melbourne  Perth  Sydney  All
Product
Phone                NaN        NaN    NaN      699  699
Tablet               NaN        NaN    599      599  599
TV                 1299       1299    NaN      NaN 1299
All                1299       1299    599      699 3996

The name prefix for margins can also be customized using margins_name.

Filling Missing Values

Missing values can be filled with a custom value using the fill_value argument.

Let’s replace NaN with 0:

pd.pivot_table(data, values='Price', index='Product', columns='City', fill_value=0)

This fills all the missing values with 0:

City        Brisbane  Melbourne  Perth  Sydney
Product
Phone             0          0      0      699
Tablet            0          0     599      599
TV              1299       1299      0        0

As we can see, Pandas pivot_table() enables us to generate customizable pivot tables with advanced features like margins, fill values, and multiple aggregation functions.

Practical Examples

Now let’s go through some practical examples of pivoting real-world data using Pandas.

Pivoting Survey Data

Suppose we have a survey data set with columns for Age, Gender, Country, and Salary.

We can load it into a DataFrame:

survey_df = pd.DataFrame({
    'Age': [23, 58, 62, 43, 38],
    'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
    'Country': ['USA', 'Japan', 'India', 'USA', 'UK'],
    'Salary': [50000, 60000, 45000, 80000, 78000]
})

To analyze average salary by age for each country, we can pivot Age into rows and Country into columns:

survey_df.pivot_table(values='Salary', index='Age', columns='Country', aggfunc='mean')

This results in the following pivot table:

Country      India   Japan      UK      USA
Age
23            NaN     NaN      NaN   50000.0
38            NaN     NaN  78000.0       NaN
43            NaN     NaN      NaN   80000.0
58            NaN  60000.0     NaN       NaN
62        45000.0     NaN      NaN       NaN

We can also display salary subtotals for each country and age by adding margins:

survey_df.pivot_table(values='Salary', index='Age', columns='Country',
                      aggfunc='mean', margins=True)

This will add row and column totals:

Country      India   Japan      UK      USA   All
Age
23            NaN     NaN      NaN   50000.0 50000.0
38            NaN     NaN  78000.0       NaN 78000.0
43            NaN     NaN      NaN   80000.0 80000.0
58            NaN  60000.0     NaN       NaN 60000.0
62        45000.0     NaN      NaN       NaN 45000.0
All        45000.0  60000.0  78000.0   65000.0 248000.0

As we can see, pivot_table() enables insightful pivoting of survey data for analysis.

Pivoting Financial Time Series Data

Pandas pivot tables are extremely useful for time series data analysis. Let’s see an example with financial stock market data.

We’ll load historical stock data for multiple companies into a DataFrame:

stocks_df = pd.DataFrame({
    'Company': ['ABC', 'XYZ', 'ABC', 'XYZ', 'ABC', 'XYZ'],
    'Date': ['2019-01-01', '2019-01-01', '2019-01-02', '2019-01-02', '2019-01-03', '2019-01-03'],
    'Price': [10, 15, 12, 8, 18, 12]
})

To analyze the average price over time for each company, we can pivot the Date into columns:

stocks_df.pivot_table(values='Price', index='Company', columns='Date', aggfunc='mean')

Output:

Date         2019-01-01  2019-01-02  2019-01-03
Company
ABC                10         12         18
XYZ                15          8         12

Adding margins provides totals for each date across companies:

stocks_df.pivot_table(values='Price', index='Company', columns='Date',
                      aggfunc='mean', margins=True)

This adds a totals column:

Date         2019-01-01  2019-01-02  2019-01-03   All
Company
ABC                 10          12         18   13.33
XYZ                 15           8         12   11.67
All                12.5        10.0       15.0   12.5

As we can see, pivot tables enable powerful reshaping and analysis of financial time series data using Pandas.

Reshaping with Pivot

While pivot_table() is great for aggregating data, the pivot() function allows pivoting the raw data into different layouts.

Let’s go back to our original sales data DataFrame:

data = pd.DataFrame({
    'Product': ['Phone', 'Tablet', 'TV', 'Tablet', 'TV'],
    'Price': [699, 599, 1299, 599, 1299 ],
    'Date': ['2021-01-01', '2021-01-05', '2021-02-03', '2021-02-24', '2021-03-01'],
    'City': ['Sydney', 'Sydney', 'Melbourne', 'Perth', 'Brisbane']
})

We can reshape it into a time series layout by pivoting the Date into columns:

data.pivot(index='Product', columns='Date', values='Price')

Output:

Date         2021-01-01  2021-01-05  2021-02-03  2021-02-24 2021-03-01
Product
Phone             699.0        NaN        NaN        NaN        NaN
Tablet            NaN        599.0        NaN      599.0        NaN
TV                NaN         NaN     1299.0        NaN     1299.0

This unpivots the dates into multiple columns while keeping the price values intact.

We can also pivot multiple columns like City and Date:

data.pivot(index='Product', columns=['City', 'Date'])

Resulting in:

City             Brisbane                 ...                  Sydney
Date         2021-03-01 2021-02-03 2021-02-24 2021-01-01 2021-01-05
Product                                   ...
Phone                NaN        NaN       NaN        699        NaN
Tablet               NaN        NaN       NaN        NaN       599
TV                 1299     1299       NaN        NaN        NaN

This pivots City and Date into the columns creating a customized reshaping of the data.

As we can see, pivot() gives us a handy and flexible way to reshape raw DataFrames into different layouts.

Summary

In summary, Pandas provides two versatile functions for pivoting data:

Key takeaways:

With Pandas’ pivoting capabilities, we can flexibly transform data into actionable insights. Both pivot() and pivot_table() are invaluable tools for data exploration and analysis in Python.