Pandas is a popular Python library used for data analysis and manipulation. One of Pandas’ key features is the DataFrame, a tabular data structure with labeled rows and columns similar to a spreadsheet or SQL table. When working with DataFrames in Pandas, it is critical to understand how to properly index, slice and access specific rows, columns or elements.
This comprehensive guide will explain the key concepts and techniques for indexing Pandas DataFrames using either the default numeric indices or setting custom indexes. We will cover how to view, set, reset and utilize indices to effectively extract data with .loc[], .iloc[] and other methods. Code examples are provided to illustrate the functionality.
Proper use of Pandas’ flexible indexing system enables efficient data analysis workflows. Developers, data scientists, and analysts can use these skills to clean, transform and gain insights from complex datasets using Python. Let’s get started!
Table of Contents
Open Table of Contents
Numeric Indexing
By default, Pandas DataFrames come with integer indices assigned to each row, starting from 0 and increasing by 1 for each subsequent row. For example:
import pandas as pd
data = {'Name': ['John', 'Mary', 'Steve', 'Sarah'],
'Age': [28, 32, 35, 27],
'Department': ['Engineering', 'Business', 'Engineering', 'Marketing']}
df = pd.DataFrame(data)
print(df)
Name Age Department
0 John 28 Engineering
1 Mary 32 Business
2 Steve 35 Engineering
3 Sarah 27 Marketing
We can see above the index values [0, 1, 2, 3] were automatically assigned.
Viewing Indices
To view the current index of a DataFrame, we can use the .index
attribute:
print(df.index)
# Int64Index([0, 1, 2, 3], dtype='int64')
Accessing by Index Position
We can then use these numeric indices to access specific rows by their integer position, using .iloc[]
:
# Access 2nd row by position
print(df.iloc[1])
# Name Mary
# Age 32
# Department Business
# Name: 1, dtype: object
.iloc[]
takes integer arguments and allows slicing as well:
# Slice rows 0 & 1
print(df.iloc[:2])
Resetting the Index
If we want to change the index values, we can easily reset them using .reset_index()
:
df = df.reset_index(drop=True)
print(df)
# Name Age Department
# 0 John 28 Engineering
# 1 Mary 32 Business
# 2 Steve 35 Engineering
# 3 Sarah 27 Marketing
Now the index has been changed to 0 through 3 again. The original index values are discarded.
Index Values as Column
To retain the original index as a column when resetting, we set drop=False
:
df = df.reset_index(drop=False)
print(df)
# index Name Age Department
# 0 0 John 28 Engineering
# 1 1 Mary 32 Business
# 2 2 Steve 35 Engineering
# 3 3 Sarah 27 Marketing
Here we see the original indices are now a column labeled index
.
This demonstrates the core functionality for indexing DataFrames with default numeric indices in Pandas. Next we will learn about setting and using custom indices.
Custom Indices
For more meaningful indexing and lookups, we can set a custom index using one or more of the DataFrame’s columns.
Setting an Index Column
The .set_index()
method can set a custom index. Let’s index by ‘Name’:
df = df.set_index('Name')
print(df)
Age Department
Name
John 28 Engineering
Mary 32 Business
Steve 35 Engineering
Sarah 27 Marketing
Now Name
labels are used as the index rather than integer positions.
Accessing by Index Label
We can then use .loc[]
to access rows by the index value:
# Access row by Name
print(df.loc['Mary'])
# Age 32
# Department Business
# Name: Mary, dtype: object
Much more useful than remembering integer positions!
Multi-level Indices
We can also set multiple columns as hierarchical indices using a list:
df = df.set_index(['Department', 'Name'])
print(df)
Age
Department Name
Engineering John 28
Steve 35
Business Mary 32
Marketing Sarah 27
Now the index contains both the Department
and Name
values, forming unique multiple index levels.
Accessing Multi-level Elements
Use both levels to access specific elements from the multi-index:
# Access John's age
print(df.loc[('Engineering', 'John'),'Age'])
# 28
This allows precise selection of data points even with complex indices.
Reset Index to Column
To move the index back to a regular column, use reset_index()
:
df = df.reset_index()
print(df)
Department Name Age
0 Engineering John 28
1 Engineering Steve 35
2 Business Mary 32
3 Marketing Sarah 27
Set Index from Column
We can also directly set the index when creating the DataFrame:
df = pd.DataFrame(data, index='Name')
print(df)
Age Department
Name
John 28 Engineering
Mary 32 Business
Steve 35 Engineering
Sarah 27 Marketing
Indexing Best Practices
Here are some key indexing best practices when working with Pandas:
- Set meaningful indexes like names, IDs or dates for real-world data.
- Use
.loc[]
for label-based indexing on custom indexes. - Use
.iloc[]
for integer position-based indexing. - Specify both index levels when accessing multi-index data.
- Reset indexes to columns when needed for analysis or exporting data.
Properly indexing DataFrames will unlock the true power and convenience of Pandas!
Indexing and Slicing Row Selection
Indexing can also be used alongside slicing to select specific rows and contiguous blocks of row data.
Slice Range of Rows
Pass a slice with .iloc[]
to get a range of rows:
# Slice rows 1-3
print(df.iloc[1:4])
Slice inclusive of start, exclusive of end position.
Select Multiple Rows
Pass a list of integers to select multiple specific rows:
# Get specific rows
rows = [0, 2, 5]
print(df.iloc[rows])
Mix of Label and Integer Indexing
Since we have a custom index, we can combine label-based .loc[]
with integer slicing:
# Slice range but use label for end
print(df.loc['John':'Mary'])
This selects from the start label through the end label.
Column Selection
Indexing can also select columns by name:
# Select single column
print(df.loc[:, 'Age'])
# Select multiple columns
print(df.loc[:, ['Age', 'Department']])
The indexing and slicing functionality presented provides the core techniques for precise data selection in analysis workflows.
Conclusion
This guide covered the fundamentals of indexing, slicing and accessing data within Pandas DataFrames using both numeric and custom indexes. Proper indexing unlocks the true power of the Pandas library for data manipulation. Key concepts included:
- Viewing, setting and resetting numeric row indices
- Creating custom indexes from one or more columns
- Precisely accessing elements using
.loc[]
and.iloc[]
- Slicing ranges of rows using label and integer positions
- Selecting specific columns by name
- Following indexing best practices
We also went through many examples demonstrating real-world usage of these techniques. With these skills you will be prepared to take full advantage of Pandas’ flexible data selection and analysis capabilities for data science and Python programming.
The Pandas indexing functionality is quite extensive, and this guide focused on the core concepts you need to know. For more advanced usage see the official Pandas documentation on Indexing and Selecting Data which provides greater detail and additional examples you can build on.
Happy indexing and slicing with Pandas!