Filtering data is an essential skill in Python programming. It allows you to extract specific subsets of data that meet certain criteria from a larger dataset. The filter()
function is a built-in Python function that allows you to filter iterable objects like lists, tuples, sets, and dictionaries conveniently.
Mastering data filtering with filter()
is invaluable for tasks like data analysis, data cleaning, working with databases, and more. This comprehensive guide will teach you how to use Python’s filter()
function to filter data in lists, tuples, sets, dictionaries, and custom objects. You’ll learn how filter()
works, its syntax, and how to construct the filtering criteria using lambda functions.
By the end of this guide, you’ll be able to:
Table of Contents
Open Table of Contents
How the filter() Function Works in Python
The filter()
function in Python takes in two parameters:
-
The first parameter is a function that tests if elements of an iterable object pass a certain condition or not. This function returns either
True
orFalse
. -
The second parameter is the iterable object itself that you want to filter, like a list, tuple, set etc.
filter()
applies the function to each element in the iterable object. It returns a new iterator containing only those elements from the iterable for which the function returned True
.
Here is the basic syntax:
filtered_object = filter(function, iterable)
Let’s understand this with a simple example. We want to filter odd numbers from a list of numbers:
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
def is_odd(num):
return num % 2 != 0
odd_numbers = filter(is_odd, numbers)
print(list(odd_numbers))
Output:
[1, 3, 5, 7, 9]
Here’s how it works step-by-step:
-
We defined an
is_odd()
function that returnsTrue
if the number is odd. -
We passed the
is_odd
function and thenumbers
list tofilter()
. -
filter()
applied theis_odd()
function to each element innumbers
. It returned a filter object containing only the odd numbers. -
We converted the filter object to a list using
list()
and printed it to see the filtered values.
This is a simple example to understand how filter()
selects elements based on a filtering criteria. Now let’s look at filter()
in more detail.
How to Use filter() to Filter Python Data
The filter()
function can be used to filter many common Python data types like lists, tuples, sets and dictionaries.
Filtering Lists
Lists are one of the most common Python data types. Here is an example of using filter()
to remove negative numbers from a list:
numbers = [-2, -1, 0, 1, 2, 3]
def is_positive(num):
return num > 0
positive_nums = filter(is_positive, numbers)
print(list(positive_nums))
Output:
[1, 2, 3]
We filtered out positive numbers from the list by passing the is_positive()
function that returns True
for positive numbers.
You can filter list data based on any condition like even numbers, prime numbers, numeric strings etc.
Filtering Tuples
Tuples are immutable lists in Python. Filtering tuples works exactly like lists:
nums = (1, 2, 3, 4, 5, 6)
def is_even(num):
return num % 2 == 0
even_nums = filter(is_even, nums)
print(tuple(even_nums))
Output:
(2, 4, 6)
Filtering Sets
Sets are unordered collections of unique elements in Python. Here’s how to filter a set:
chars = {'a', 'b', 'c', 'd', 'e', 'f'}
def is_vowel(char):
return char in 'aeiou'
vowels = filter(is_vowel, chars)
print(set(vowels))
Output:
{'a', 'e'}
We filtered the set to contain only vowel characters.
Filtering Dictionaries
To filter a dictionary, you need to filter its keys, values, or items.
Here’s an example to filter a dictionary to only contain items whose values are greater than 0:
nums = {1: -2, 2: 0, 3: 3, 4: -4}
def is_positive(kv):
return kv[1] > 0
positive_dict = filter(is_positive, nums.items())
print(dict(positive_dict))
Output:
{3: 3}
We passed the nums.items()
to return a tuple of (key, value)
pairs which can then be filtered by the value.
You can also filter dictionaries by keys or values separately.
Constructing Filter Functions with Lambdas
The filter criteria function can quickly get complex for real-world scenarios. To keep the code concise, Python lambdas can be used as filter functions instead of defined functions.
Lambdas are simple anonymous functions that can contain only a single expression.
Here is an example of filtering with lambdas:
nums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
odd_nums = filter(lambda x: x%2 != 0, nums)
print(list(odd_nums))
The lambda function lambda x: x%2 != 0
replaces the defined is_odd()
function from the previous example.
Using lambdas makes the code compact. But defined functions are better for complex filtering criteria.
Filtering Custom Python Objects
You can also filter custom classes in Python by filtering based on object attributes.
For example, we have a Person
class:
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
person1 = Person('John', 20)
person2 = Person('Jill', 18)
person3 = Person('Jack', 32)
people = [person1, person2, person3]
We can filter Person objects based on age:
under_21 = filter(lambda person: person.age < 21, people)
print(list(under_21))
This will filter out Person
objects whose age is less than 21.
You can filter any class objects similarly based on the attributes required.
Combining filter() with map() and Other Functions
The filter()
function can be combined with other functions like map()
to process the filtered data further.
For example, to filter a list and square the filtered numbers:
nums = [1, 2, 3, 4, 5, 6, 7, 8]
filtered = filter(lambda x: x%2 == 0, nums)
squared = map(lambda x: x**2, filtered)
print(list(squared))
Output:
[4, 16, 36, 64]
We first filtered even numbers, then squared each number. This shows how filter()
and map()
can be combined.
You can chain together any number of functions like filter()
, map()
, sorted()
, reduce()
etc to get the desired filtered and processed data.
Real-World Examples of Data Filtering with filter()
Let’s look at some real-world examples of using filter()
for data filtering.
Filtering Database Records
filter()
is commonly used to filter rows from databases based on conditions. For example:
import psycopg2
conn = psycopg2.connect(dbname="mydb")
cur = conn.cursor()
cur.execute("SELECT * FROM employees")
rows = cur.fetchall()
senior_employees = filter(lambda emp: emp[2] > 5, rows)
for emp in senior_employees:
print(emp)
This filters database rows where the years of experience is more than 5 years.
Removing Stopwords from Text
Stopwords are common words like ‘a’, ‘and’, ‘the’ that should be filtered out from text:
from nltk.corpus import stopwords
text = "The quick brown fox jumps over the lazy dog"
stop_words = set(stopwords.words('english'))
words = text.split()
filtered_words = filter(lambda word: word not in stop_words, words)
print(filtered_words)
This filters out all the stopwords from the text.
Finding Prime Numbers
The filter()
function can filter out prime numbers from a sequence of numbers:
nums = range(1, 25)
primes = filter(lambda x: all(x%y != 0 for y in range(2, x)), nums)
print(list(primes))
Here, we constructed a lambda function that checks each number for primality and filter()
gives only the primes.
These are just a few examples. filter()
can be used in many other scenarios like processing datasets, analyzing logs etc.
Conclusion
The filter()
function is an important built-in function in Python for filtering iterable data. This guide covers how to use filter()
to filter lists, tuples, sets, dictionaries as well as custom objects based on different criteria.
Constructing the filter criteria using lambda functions allows for compact and flexible filtering. filter()
can be combined with other functions like map()
to process the filtered data further.
Some real-world use cases of filter()
were also discussed like filtering database records, text analysis and finding prime numbers. Mastering filter()
will make your Python data processing and analysis work easier and faster.
The key points from this guide are:
filter()
takes a function and iterable as parameters- The function returns
True
orFalse
to test if elements should be filtered filter()
returns filtered iterators, convert to lists or sets if required- Use lambdas for compact filter criteria
- Filter works for lists, tuples, sets, dicts and custom classes
- Combine with
map()
,sorted()
etc for processing pipeline - Useful for database record filtering, text analysis, finding primes etc.
I hope this guide provided a comprehensive overview of how to filter data in Python using filter()
. You should now be able to comfortably use this function for a wide range of applications.