Pandas is a popular Python library used for data analysis and manipulation. One of Pandas’ most powerful features is the ability to subset and filter DataFrames using Boolean indexing. This allows you to easily select subsets of data that meet specific criteria.
Two useful methods for Boolean indexing in Pandas are DataFrame.query()
and DataFrame.eval()
. query()
allows you to filter DataFrames using an intuitive query syntax similar to SQL. eval()
enables you to evaluate Boolean expressions over DataFrames for filtering and subsetting.
In this comprehensive guide, you will learn:
- The basics of Boolean indexing in Pandas
- How to use
DataFrame.query()
for intuitive querying and filtering - How to use
DataFrame.eval()
for evaluating Boolean expressions - Differences between
query()
andeval()
- When to use each method based on use cases
- Real-world examples and applications
Understanding Boolean indexing and using query()
and eval()
efficiently can help make your Pandas workflows more productive, concise, and powerful. Let’s dive in!
Boolean Indexing Overview
Boolean indexing provides a flexible way to filter and subset Pandas DataFrames based on conditional logic. It involves:
- Defining selection criteria you want to filter by
- Creating Boolean Series representing selection criteria
- Using Boolean Series to filter original DataFrame
For example:
import pandas as pd
data = pd.DataFrame({'A': [1, 2, 3],
'B': [4, 5, 6]})
# Create boolean filter
filter = (data['A'] > 1) & (data['B'] == 5)
# Subset data based on filter
data[filter]
A B
1 2 5
Here we see:
- Filter criteria defined using comparison operators
- Boolean Series
filter
created representing criteria - Dataframe subsetted returning rows where
filter
is True
This demonstrates the core pattern of Boolean indexing in Pandas. The key is crafting the Boolean criteria to match your desired filters.
query() Method
Pandas query()
method allows you to filter DataFrames using intuitive query syntax:
data.query('A > 1 & B == 5')
This is equivalent to the previous Boolean indexing example.
Key properties of query()
:
- Accepts a string expression defining filter criteria
- Expression uses Python comparison operators
- Can reference DataFrame columns directly
- Similar syntax to SQL WHERE clauses
For example:
data.query('A >= 2 & C != "x"')
This filters to rows where:
- Column A is greater than or equal to 2
- Column C does not equal x
query()
syntax makes it very easy to generate filters. You can reference columns directly instead of having to encapsulate criteria in separate Boolean Series.
query() Expression Syntax
query()
expressions allow the following syntax:
Comparisons:
==
,!=
,>
,>=
,<
,<=
for equal, not equal, greater than, etc
Boolean logic:
|
for OR&
for AND~
for NOT
Grouping:
- Parentheses
()
can be used to group logic
For example:
data.query('A >= 2 & (B == 5 | C == 10)')
This selects rows where A is greater than or equal to 2 AND either B equals 5 OR C equals 10.
The query expression syntax provides a lot of flexibility to craft your filtering criteria.
query() Parameters
query()
accepts the following parameters:
expr
: The query string expressioninplace
: If True, filter DataFrame in place instead of returning copylocal_dict
: Dict of local variables to use in query
For example:
min_a = 2
max_b = 8
data.query('A > @min_a and B < @max_b', inplace=True)
Here we filter data in place using variables min_a
and max_b
.
The local_dict
parameter can also be used to pass in local variables.
query() Use Cases
query()
is ideal for:
- Ad hoc filtering of DataFrames
- Querying and filtering pipelines
- Integrating querying into functions
- Simple filtering criteria
It provides an intuitive way to generate filters using a string expression. The syntax is great for interactively exploring and querying data.
eval() Method
Pandas eval()
evaluates an expression and returns the result. For DataFrames, it evaluates over all values.
For example:
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df.eval('A + B')
0 5
1 7
2 9
dtype: int64
This evaluates A + B
over all rows and returns the result as a Series.
eval() Expression Syntax
eval()
accepts valid Python expressions. This includes:
- Arithmetic operators:
+
,-
,*
,/
- Comparison operators:
==
,!=
,>
,>=
, etc - Boolean operators:
&
,|
,~
- Math functions:
abs()
,sqrt()
, etc - DataFrame columns can be directly referenced
For example:
df.eval('abs(A - B) < 2')
This computes the absolute difference between columns A and B and filters rows where difference is less than 2.
The flexible expression syntax allows complex logic and operations over DataFrames.
eval() Parameters
eval()
takes the following parameters:
expr
: Expression string to evaluateinplace
: If True, evaluate in place instead of returning copylocal_dict
: Dict of local variables to use in expression
For example:
threshold = 0.5
df.eval('A > @threshold', inplace=True)
Here we evaluate A > threshold in place using local variable threshold
.
eval() Use Cases
eval()
is ideal for:
- Complex filtering logic
- Evaluating arithmetic expressions
- Applying math functions over columns
- Parameterizing filtering logic
- General purpose expression evaluation
It provides a flexible way to evaluate any valid Python expression over DataFrames.
query() vs eval()
query()
and eval()
overlap in their capabilities for DataFrame filtering. But there are some differences:
query()
- Designed specifically for filtering DataFrames
- SQL-like query syntax
- Intuitive for simple queries
- Limited to comparisons and Boolean operators
eval()
- Evaluates any valid Python expression
- Supports complex math and custom functions
- More flexible syntax
- Not as readable for simple queries
In general:
- Use
query()
for simple filtering based on column comparisons - Use
eval()
when you need complex logic, math, or custom functions
Examples
Here are some real-world examples demonstrating how to use query()
and eval()
for filtering.
Filtering Numeric Columns
Filter numeric columns using comparisons:
df = pd.DataFrame({'A': [1, 2, 3], 'B': [2, 4, 6]})
# query()
df.query('A > 1 & B > 2')
# eval()
df.eval('A > 1 and B > 2')
Both filter to rows where A is greater than 1 and B is greater than 2.
Filtering String Columns
Filter string columns using equality:
data = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie']})
# query()
data.query("Name == 'Bob'")
# eval()
data.eval("Name == 'Bob'")
Filtering with Functions
Apply functions like abs()
when filtering:
df = pd.DataFrame({'A': [1, -2, 3]})
# abs(A) > 1
df.eval('abs(A) > 1')
Parameterized Filtering
Parameterize filters using variables:
min_value = 2
max_value = 5
df.query('A > @min_value and B < @max_value')
This allows encapsulating logic in variables.
Combining Multiple Conditions
Use Boolean logic to combine multiple filtering conditions:
df.query('A > 2 | B == 3')
This selects rows where A > 2 OR B equals 3.
Conclusion
In this guide, you learned how to leverage Pandas’ query()
and eval()
for powerful Boolean indexing and filtering. Key takeaways include:
query()
provides an intuitive SQL-like syntax for filtering DataFrameseval()
evaluates any Python expression over DataFrames- Use
query()
for simple column comparisons and filters - Use
eval()
when you need complex logic, math, or functions - Both methods allow parameterization and combining conditions
- Apply these tools to clean, wrangle, and filter your data with ease
Boolean indexing with query()
and eval()
will enable you to efficiently slice and dice your DataFrames to extract the data insights you need. The techniques discussed provide important skills for any Python Pandas user to master.