Pandas: Indexing with Numeric Indices or Custom Indexes

Pandas is a popular Python library used for data analysis and manipulation. One of Pandas’ key features is the DataFrame, a tabular data structure with labeled rows and columns similar to a spreadsheet or SQL table. When working with DataFrames in Pandas, it is critical to understand how to properly index, slice and access specific rows, columns or elements.

This comprehensive guide will explain the key concepts and techniques for indexing Pandas DataFrames using either the default numeric indices or setting custom indexes. We will cover how to view, set, reset and utilize indices to effectively extract data with .loc[], .iloc[] and other methods. Code examples are provided to illustrate the functionality.

Proper use of Pandas’ flexible indexing system enables efficient data analysis workflows. Developers, data scientists, and analysts can use these skills to clean, transform and gain insights from complex datasets using Python. Let’s get started!

Open Table of Contents

Numeric Indexing
Custom Indices
Indexing and Slicing Row Selection
Conclusion

Numeric Indexing

By default, Pandas DataFrames come with integer indices assigned to each row, starting from 0 and increasing by 1 for each subsequent row. For example:

import pandas as pd

data = {'Name': ['John', 'Mary', 'Steve', 'Sarah'],
        'Age': [28, 32, 35, 27],
        'Department': ['Engineering', 'Business', 'Engineering', 'Marketing']}

df = pd.DataFrame(data)

print(df)

   Name  Age     Department
0  John   28     Engineering
1  Mary   32       Business
2 Steve   35     Engineering
3 Sarah   27        Marketing

We can see above the index values [0, 1, 2, 3] were automatically assigned.

Viewing Indices

To view the current index of a DataFrame, we can use the .index attribute:

print(df.index)

# Int64Index([0, 1, 2, 3], dtype='int64')

Accessing by Index Position

We can then use these numeric indices to access specific rows by their integer position, using .iloc[]:

# Access 2nd row by position
print(df.iloc[1])

# Name     Mary
# Age        32
# Department Business
# Name: 1, dtype: object

.iloc[] takes integer arguments and allows slicing as well:

# Slice rows 0 & 1
print(df.iloc[:2])

Resetting the Index

If we want to change the index values, we can easily reset them using .reset_index():

df = df.reset_index(drop=True)

print(df)
#    Name  Age     Department
# 0  John   28     Engineering
# 1  Mary   32       Business
# 2 Steve   35     Engineering
# 3 Sarah   27        Marketing

Now the index has been changed to 0 through 3 again. The original index values are discarded.

Index Values as Column

To retain the original index as a column when resetting, we set drop=False:

df = df.reset_index(drop=False)

print(df)

#    index  Name  Age     Department
# 0      0  John   28     Engineering
# 1      1  Mary   32       Business
# 2      2 Steve  35     Engineering
# 3      3 Sarah  27        Marketing

Here we see the original indices are now a column labeled index.

This demonstrates the core functionality for indexing DataFrames with default numeric indices in Pandas. Next we will learn about setting and using custom indices.

Custom Indices

For more meaningful indexing and lookups, we can set a custom index using one or more of the DataFrame’s columns.

Setting an Index Column

The .set_index() method can set a custom index. Let’s index by ‘Name’:

df = df.set_index('Name')

print(df)
                Age     Department
Name
John            28     Engineering
Mary            32       Business
Steve           35     Engineering
Sarah           27        Marketing

Now Name labels are used as the index rather than integer positions.

Accessing by Index Label

We can then use .loc[] to access rows by the index value:

# Access row by Name
print(df.loc['Mary'])

# Age              32
# Department    Business
# Name: Mary, dtype: object

Much more useful than remembering integer positions!

Multi-level Indices

We can also set multiple columns as hierarchical indices using a list:

df = df.set_index(['Department', 'Name'])

print(df)
                                 Age
Department    Name
Engineering   John         28
               Steve        35
Business      Mary         32
Marketing     Sarah        27

Now the index contains both the Department and Name values, forming unique multiple index levels.

Accessing Multi-level Elements

Use both levels to access specific elements from the multi-index:

# Access John's age
print(df.loc[('Engineering', 'John'),'Age'])

# 28

This allows precise selection of data points even with complex indices.

Reset Index to Column

To move the index back to a regular column, use reset_index():

df = df.reset_index()

print(df)
  Department   Name  Age
0 Engineering   John   28
1 Engineering  Steve   35
2   Business   Mary   32
3   Marketing  Sarah   27

Set Index from Column

We can also directly set the index when creating the DataFrame:

df = pd.DataFrame(data, index='Name')

print(df)
             Age     Department
Name
John         28     Engineering
Mary         32       Business
Steve        35     Engineering
Sarah        27        Marketing

Indexing Best Practices

Here are some key indexing best practices when working with Pandas:

Set meaningful indexes like names, IDs or dates for real-world data.
Use .loc[] for label-based indexing on custom indexes.
Use .iloc[] for integer position-based indexing.
Specify both index levels when accessing multi-index data.
Reset indexes to columns when needed for analysis or exporting data.

Properly indexing DataFrames will unlock the true power and convenience of Pandas!

Indexing and Slicing Row Selection

Indexing can also be used alongside slicing to select specific rows and contiguous blocks of row data.

Slice Range of Rows

Pass a slice with .iloc[] to get a range of rows:

# Slice rows 1-3
print(df.iloc[1:4])

Slice inclusive of start, exclusive of end position.

Select Multiple Rows

Pass a list of integers to select multiple specific rows:

# Get specific rows
rows = [0, 2, 5]
print(df.iloc[rows])

Mix of Label and Integer Indexing

Since we have a custom index, we can combine label-based .loc[] with integer slicing:

# Slice range but use label for end
print(df.loc['John':'Mary'])

This selects from the start label through the end label.

Column Selection

Indexing can also select columns by name:

# Select single column
print(df.loc[:, 'Age'])

# Select multiple columns
print(df.loc[:, ['Age', 'Department']])

The indexing and slicing functionality presented provides the core techniques for precise data selection in analysis workflows.

Conclusion

This guide covered the fundamentals of indexing, slicing and accessing data within Pandas DataFrames using both numeric and custom indexes. Proper indexing unlocks the true power of the Pandas library for data manipulation. Key concepts included:

Viewing, setting and resetting numeric row indices
Creating custom indexes from one or more columns
Precisely accessing elements using .loc[] and .iloc[]
Slicing ranges of rows using label and integer positions
Selecting specific columns by name
Following indexing best practices

We also went through many examples demonstrating real-world usage of these techniques. With these skills you will be prepared to take full advantage of Pandas’ flexible data selection and analysis capabilities for data science and Python programming.

The Pandas indexing functionality is quite extensive, and this guide focused on the core concepts you need to know. For more advanced usage see the official Pandas documentation on Indexing and Selecting Data which provides greater detail and additional examples you can build on.

Happy indexing and slicing with Pandas!