Skip to content

Pandas: Boolean Indexing with query() and eval()

Updated: at 05:32 AM

Pandas is a popular Python library used for data analysis and manipulation. One of Pandas’ most powerful features is the ability to subset and filter DataFrames using Boolean indexing. This allows you to easily select subsets of data that meet specific criteria.

Two useful methods for Boolean indexing in Pandas are DataFrame.query() and DataFrame.eval(). query() allows you to filter DataFrames using an intuitive query syntax similar to SQL. eval() enables you to evaluate Boolean expressions over DataFrames for filtering and subsetting.

In this comprehensive guide, you will learn:

Understanding Boolean indexing and using query() and eval() efficiently can help make your Pandas workflows more productive, concise, and powerful. Let’s dive in!

Boolean Indexing Overview

Boolean indexing provides a flexible way to filter and subset Pandas DataFrames based on conditional logic. It involves:

  1. Defining selection criteria you want to filter by
  2. Creating Boolean Series representing selection criteria
  3. Using Boolean Series to filter original DataFrame

For example:

import pandas as pd

data = pd.DataFrame({'A': [1, 2, 3],
                     'B': [4, 5, 6]})

# Create boolean filter
filter = (data['A'] > 1) & (data['B'] == 5)

# Subset data based on filter
data[filter]
   A  B
1  2  5

Here we see:

  1. Filter criteria defined using comparison operators
  2. Boolean Series filter created representing criteria
  3. Dataframe subsetted returning rows where filter is True

This demonstrates the core pattern of Boolean indexing in Pandas. The key is crafting the Boolean criteria to match your desired filters.

query() Method

Pandas query() method allows you to filter DataFrames using intuitive query syntax:

data.query('A > 1 & B == 5')

This is equivalent to the previous Boolean indexing example.

Key properties of query():

For example:

data.query('A >= 2 & C != "x"')

This filters to rows where:

query() syntax makes it very easy to generate filters. You can reference columns directly instead of having to encapsulate criteria in separate Boolean Series.

query() Expression Syntax

query() expressions allow the following syntax:

Comparisons:

Boolean logic:

Grouping:

For example:

data.query('A >= 2 & (B == 5 | C == 10)')

This selects rows where A is greater than or equal to 2 AND either B equals 5 OR C equals 10.

The query expression syntax provides a lot of flexibility to craft your filtering criteria.

query() Parameters

query() accepts the following parameters:

For example:

min_a = 2
max_b = 8

data.query('A > @min_a and B < @max_b', inplace=True)

Here we filter data in place using variables min_a and max_b.

The local_dict parameter can also be used to pass in local variables.

query() Use Cases

query() is ideal for:

It provides an intuitive way to generate filters using a string expression. The syntax is great for interactively exploring and querying data.

eval() Method

Pandas eval() evaluates an expression and returns the result. For DataFrames, it evaluates over all values.

For example:

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

df.eval('A + B')
0    5
1    7
2    9
dtype: int64

This evaluates A + B over all rows and returns the result as a Series.

eval() Expression Syntax

eval() accepts valid Python expressions. This includes:

For example:

df.eval('abs(A - B) < 2')

This computes the absolute difference between columns A and B and filters rows where difference is less than 2.

The flexible expression syntax allows complex logic and operations over DataFrames.

eval() Parameters

eval() takes the following parameters:

For example:

threshold = 0.5
df.eval('A > @threshold', inplace=True)

Here we evaluate A > threshold in place using local variable threshold.

eval() Use Cases

eval() is ideal for:

It provides a flexible way to evaluate any valid Python expression over DataFrames.

query() vs eval()

query() and eval() overlap in their capabilities for DataFrame filtering. But there are some differences:

query()

eval()

In general:

Examples

Here are some real-world examples demonstrating how to use query() and eval() for filtering.

Filtering Numeric Columns

Filter numeric columns using comparisons:

df = pd.DataFrame({'A': [1, 2, 3], 'B': [2, 4, 6]})

# query()
df.query('A > 1 & B > 2')

# eval()
df.eval('A > 1 and B > 2')

Both filter to rows where A is greater than 1 and B is greater than 2.

Filtering String Columns

Filter string columns using equality:

data = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie']})

# query()
data.query("Name == 'Bob'")

# eval()
data.eval("Name == 'Bob'")

Filtering with Functions

Apply functions like abs() when filtering:

df = pd.DataFrame({'A': [1, -2, 3]})

# abs(A) > 1
df.eval('abs(A) > 1')

Parameterized Filtering

Parameterize filters using variables:

min_value = 2
max_value = 5

df.query('A > @min_value and B < @max_value')

This allows encapsulating logic in variables.

Combining Multiple Conditions

Use Boolean logic to combine multiple filtering conditions:

df.query('A > 2 | B == 3')

This selects rows where A > 2 OR B equals 3.

Conclusion

In this guide, you learned how to leverage Pandas’ query() and eval() for powerful Boolean indexing and filtering. Key takeaways include:

Boolean indexing with query() and eval() will enable you to efficiently slice and dice your DataFrames to extract the data insights you need. The techniques discussed provide important skills for any Python Pandas user to master.