Skip to content

A Comprehensive Guide to Pandas Label and Integer Location Based Indexing

Updated: at 01:37 AM

Pandas is a popular Python library used for data analysis and manipulation. One of Pandas’ key features is its powerful indexing functionality that allows you to slice, dice, and access specific subsets of data in DataFrames and Series objects quickly and easily. In Pandas, you can index DataFrames using labels (like column names) or integers representing the numerical locations of rows and columns.

This comprehensive guide will explain Pandas’ label and integer location based indexing in detail with examples. We will cover:

Table of Contents

Open Table of Contents

Overview of Pandas Indexes

In Pandas, indexes are used to keep track of and access data within DataFrames and Series objects. By default, Pandas will create RangeIndex as the index when creating new DataFrames.

import pandas as pd

df = pd.DataFrame({'Column1': [1, 2, 3], 'Column2': [4, 5, 6]})

print(df)
#    Column1  Column2
# 0        1        4
# 1        2        5
# 2        3        6

print(df.index)
# RangeIndex(start=0, stop=3, step=1)

We can see above that the default index is a numeric RangeIndex from 0 to 2 (the number of rows minus 1).

Indexes can be changed by passing the index parameter during DataFrame creation:

data = {'Column1': [1, 2, 3], 'Column2': [4, 5, 6]}
df = pd.DataFrame(data, index=['a', 'b', 'c'])

print(df)
#    Column1  Column2
# a        1        4
# b        2        5
# c        3        6

print(df.index)
# Index(['a', 'b', 'c'], dtype='object')

Now the index contains custom string labels rather than the default integers.

Index labels are immutable and can’t be altered once set. But the index itself can be changed later with DataFrame.set_index() or DataFrame.reset_index().

Indexes make data retrieval much easier in Pandas by allowing label and integer based indexing, which we’ll explore next.

Retrieving Data with .loc and .iloc

Pandas provides two main attribute accessors for retrieving data from DataFrames using label and integer based indexing - .loc and .iloc.

.loc allows selecting data by label or text based indexes. The label can be the column name, index name, a slice with labels, a list of labels, or a Boolean array.

.iloc allows selecting data by integer positional locations or numerical order. The integer can be a single position, slice with integers, a list of positions, or a Boolean array.

Let’s see some examples of using .loc and .iloc on the following DataFrame:

import pandas as pd

data = {'Brand': ['Honda', 'Toyota', 'Ford', 'Tesla'],
        'Price': [22000, 25000, 20000, 35000]}

df = pd.DataFrame(data)

print(df)
#       Brand  Price
# 0     Honda  22000
# 1    Toyota  25000
# 2      Ford  20000
# 3     Tesla  35000

To select a single row by label using .loc, pass the index label:

single_row = df.loc['Toyota']

print(single_row)
# Brand    Toyota
# Price    25000
# Name: 1, dtype: object

For a single column by label, pass the column name:

single_col = df.loc[:, 'Price']

print(single_col)
# 0    22000
# 1    25000
# 2    20000
# 3    35000
# Name: Price, dtype: int64

For multiple rows or columns, pass a list of labels:

multi_rows = df.loc[['Honda', 'Toyota']]
multi_cols = df.loc[:, ['Brand', 'Price']]

To select a slice of rows with .loc, use slice notation with labels:

row_slice = df.loc['Ford':'Tesla']

print(row_slice)
#       Brand  Price
# 2      Ford  20000
# 3     Tesla  35000

For integers based selection with .iloc, pass the numeric index like:

first_row = df.iloc[0]
first_col = df.iloc[:, 0]

row_slice = df.iloc[1:3]

In summary, .loc selects data by label and .iloc selects data by integer position.

Selecting Rows and Columns by Label and Integer Location

Let’s now take a deeper look at how to select specific DataFrame rows and columns using both label and integer based indexing.

To select a single row by label, use .loc and pass the index label:

row = df.loc['Toyota']

For a single row by integer location, use .iloc and pass the index integer position:

row = df.iloc[1]

For multiple rows by label, pass a list of labels to .loc:

rows = df.loc[['Toyota', 'Ford']]

For multiple rows by integer position, pass a list of ints to .iloc:

rows = df.iloc[[1, 2]]

Selecting Columns

Selecting DataFrame columns works the same way - pass column names to .loc and column integer positions to .iloc.

Single column by label:

col = df.loc[:, 'Price']

Single column by integer location:

col = df.iloc[:, 1]

Multiple columns by label:

cols = df.loc[:, ['Brand', 'Price']]

Multiple columns by integer position:

cols = df.iloc[:, [0, 1]]

Selecting Subsets with Slices

You can also select subsets of rows and columns using slices with .loc and .iloc.

Slice rows between two labels (inclusive):

subset = df.loc['Toyota':'Ford']

Slice rows between two integer positions (exclusive endpoint):

subset = df.iloc[1:3]

Slice columns between two labels:

subset = df.loc[:'Price']

Slice columns between two integer positions:

subset = df.iloc[:,:1]

So slices with .loc are inclusive but slices with .iloc exclude the endpoint index.

Using Boolean Indexing on Selection

Pandas also allows selecting rows and columns from DataFrames using Boolean conditions or masks.

First, create a Boolean Series indicating True/False if each row meets some criteria:

price_filter = df['Price'] > 22000

print(price_filter)

# 0    False
# 1     True
# 2    False
# 3     True
# Name: Price, dtype: bool

Pass this mask to .loc to filter rows:

df.loc[price_filter]

#       Brand  Price
# 1    Toyota  25000
# 3     Tesla  35000

This selects all rows where Price is over 22,000.

For columns, create a Boolean mask then pass it to .loc:

brand_col = df.loc[:, df.columns.str.contains('Brand')]

This selects any columns having ‘Brand’ in their label.

Boolean masks provide a powerful, flexible way to make complex selections from Pandas objects.

Reindexing and Altering Existing Indexes

The existing index of a DataFrame can be changed using various reindex methods:

DataFrame.reindex()

DataFrame.reindex() takes a list of new labels to conform the data to:

new_idx = ['Ford', 'Honda', 'Tesla', 'Toyota']
df.reindex(new_idx)

This reorders the rows to match the new label ordering.

We can also reindex by passing an integer array:

new_order = [2, 0, 3, 1]
df.reindex(new_order)

This shuffles the rows to match the integer positions passed.

DataFrame.reset_index()

To reset the index to the default consecutive ints, use reset_index():

df.reset_index()

#    index   Brand  Price
# 0      0   Honda  22000
# 1      1  Toyota  25000
# 2      2    Ford  20000
# 3      3   Tesla  35000

The existing index is moved into a new ‘index’ column.

DataFrame.set_index()

To create a new index from a column, use set_index():

df.set_index('Brand')

#             Price
# Brand
# Honda       22000
# Toyota      25000
# Ford        20000
# Tesla       35000

The ‘Brand’ column values are now used as the new index.

Multi-Level and Hierarchical Indexing

Pandas supports indexing with multi-level or hierarchical indexes that have multiple layers of labels.

For example:

data = {
  ('Tech', 'Apple'): [12, 15],
  ('Tech', 'Google'): [13, 14],
  ('Auto', 'Toyota'): [10, 12],
  ('Auto', 'Honda'): [11, 13]
}

df = pd.DataFrame(data)
print(df)

#             0   1
# Tech Apple  12  15
#      Google 13  14
# Auto Toyota 10  12
#      Honda  11  13

Here we have a two-level index - ‘Tech’/‘Auto’ and ‘Apple’/‘Google’/‘Toyota’/‘Honda’.

To select rows from the outer level, use .loc[] with the first index label:

df.loc['Tech']

#             0   1
# Apple      12  15
# Google     13  14

For the inner level, provide both labels to .loc[]:

df.loc[('Auto', 'Honda')]

#           0     1
# Auto Honda  11  13

The first index refers to the outer level, second index is the inner level label.

You can also use .iloc by providing tuples with the integer positions of the indexes:

df.iloc[(1, 0)]  # 2nd Level 0th Label
df.iloc[(3, 1)]  # 4th Level 1st Label

Multi-indexes allow organizing complex, hierarchical data in tabular format.

Best Practices for Indexing in Pandas

Here are some key best practices to follow when indexing in Pandas:

Properly leveraging Pandas’ powerful indexing functionality will allow you to efficiently access, manipulate, and analyze data in Python. With the concepts covered in this guide, you should have a comprehensive understanding of how to use label and position based indexing in Pandas for slicing and dicing DataFrames and Series.