How to Filter Data in Python with the filter() Function

Filtering data is an essential skill in Python programming. It allows you to extract specific subsets of data that meet certain criteria from a larger dataset. The filter() function is a built-in Python function that allows you to filter iterable objects like lists, tuples, sets, and dictionaries conveniently.

Mastering data filtering with filter() is invaluable for tasks like data analysis, data cleaning, working with databases, and more. This comprehensive guide will teach you how to use Python’s filter() function to filter data in lists, tuples, sets, dictionaries, and custom objects. You’ll learn how filter() works, its syntax, and how to construct the filtering criteria using lambda functions.

By the end of this guide, you’ll be able to:

Open Table of Contents

How the filter() Function Works in Python
How to Use filter() to Filter Python Data
Constructing Filter Functions with Lambdas
Filtering Custom Python Objects
Combining filter() with map() and Other Functions
Real-World Examples of Data Filtering with filter()
Conclusion

How the filter() Function Works in Python

The filter() function in Python takes in two parameters:

The first parameter is a function that tests if elements of an iterable object pass a certain condition or not. This function returns either True or False.
The second parameter is the iterable object itself that you want to filter, like a list, tuple, set etc.

filter() applies the function to each element in the iterable object. It returns a new iterator containing only those elements from the iterable for which the function returned True.

Here is the basic syntax:

filtered_object = filter(function, iterable)

Let’s understand this with a simple example. We want to filter odd numbers from a list of numbers:

numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

def is_odd(num):
    return num % 2 != 0

odd_numbers = filter(is_odd, numbers)

print(list(odd_numbers))

Output:

[1, 3, 5, 7, 9]

Here’s how it works step-by-step:

We defined an is_odd() function that returns True if the number is odd.
We passed the is_odd function and the numbers list to filter().
filter() applied the is_odd() function to each element in numbers. It returned a filter object containing only the odd numbers.
We converted the filter object to a list using list() and printed it to see the filtered values.

This is a simple example to understand how filter() selects elements based on a filtering criteria. Now let’s look at filter() in more detail.

How to Use filter() to Filter Python Data

The filter() function can be used to filter many common Python data types like lists, tuples, sets and dictionaries.

Filtering Lists

Lists are one of the most common Python data types. Here is an example of using filter() to remove negative numbers from a list:

numbers = [-2, -1, 0, 1, 2, 3]

def is_positive(num):
    return num > 0

positive_nums = filter(is_positive, numbers)

print(list(positive_nums))

Output:

[1, 2, 3]

We filtered out positive numbers from the list by passing the is_positive() function that returns True for positive numbers.

You can filter list data based on any condition like even numbers, prime numbers, numeric strings etc.

Filtering Tuples

Tuples are immutable lists in Python. Filtering tuples works exactly like lists:

nums = (1, 2, 3, 4, 5, 6)

def is_even(num):
    return num % 2 == 0

even_nums = filter(is_even, nums)

print(tuple(even_nums))

Output:

(2, 4, 6)

Filtering Sets

Sets are unordered collections of unique elements in Python. Here’s how to filter a set:

chars = {'a', 'b', 'c', 'd', 'e', 'f'}

def is_vowel(char):
    return char in 'aeiou'

vowels = filter(is_vowel, chars)

print(set(vowels))

Output:

{'a', 'e'}

We filtered the set to contain only vowel characters.

Filtering Dictionaries

To filter a dictionary, you need to filter its keys, values, or items.

Here’s an example to filter a dictionary to only contain items whose values are greater than 0:

nums = {1: -2, 2: 0, 3: 3, 4: -4}

def is_positive(kv):
    return kv[1] > 0

positive_dict = filter(is_positive, nums.items())

print(dict(positive_dict))

Output:

{3: 3}

We passed the nums.items() to return a tuple of (key, value) pairs which can then be filtered by the value.

You can also filter dictionaries by keys or values separately.

Constructing Filter Functions with Lambdas

The filter criteria function can quickly get complex for real-world scenarios. To keep the code concise, Python lambdas can be used as filter functions instead of defined functions.

Lambdas are simple anonymous functions that can contain only a single expression.

Here is an example of filtering with lambdas:

nums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

odd_nums = filter(lambda x: x%2 != 0, nums)

print(list(odd_nums))

The lambda function lambda x: x%2 != 0 replaces the defined is_odd() function from the previous example.

Using lambdas makes the code compact. But defined functions are better for complex filtering criteria.

Filtering Custom Python Objects

You can also filter custom classes in Python by filtering based on object attributes.

For example, we have a Person class:

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

person1 = Person('John', 20)
person2 = Person('Jill', 18)
person3 = Person('Jack', 32)

people = [person1, person2, person3]

We can filter Person objects based on age:

under_21 = filter(lambda person: person.age < 21, people)

print(list(under_21))

This will filter out Person objects whose age is less than 21.

You can filter any class objects similarly based on the attributes required.

Combining filter() with map() and Other Functions

The filter() function can be combined with other functions like map() to process the filtered data further.

For example, to filter a list and square the filtered numbers:

nums = [1, 2, 3, 4, 5, 6, 7, 8]

filtered = filter(lambda x: x%2 == 0, nums)
squared = map(lambda x: x**2, filtered)

print(list(squared))

Output:

[4, 16, 36, 64]

We first filtered even numbers, then squared each number. This shows how filter() and map() can be combined.

You can chain together any number of functions like filter(), map(), sorted(), reduce() etc to get the desired filtered and processed data.

Real-World Examples of Data Filtering with filter()

Let’s look at some real-world examples of using filter() for data filtering.

Filtering Database Records

filter() is commonly used to filter rows from databases based on conditions. For example:

import psycopg2

conn = psycopg2.connect(dbname="mydb")
cur = conn.cursor()

cur.execute("SELECT * FROM employees")
rows = cur.fetchall()

senior_employees = filter(lambda emp: emp[2] > 5, rows)

for emp in senior_employees:
   print(emp)

This filters database rows where the years of experience is more than 5 years.

Removing Stopwords from Text

Stopwords are common words like ‘a’, ‘and’, ‘the’ that should be filtered out from text:

from nltk.corpus import stopwords

text = "The quick brown fox jumps over the lazy dog"

stop_words = set(stopwords.words('english'))

words = text.split()

filtered_words = filter(lambda word: word not in stop_words, words)

print(filtered_words)

This filters out all the stopwords from the text.

Finding Prime Numbers

The filter() function can filter out prime numbers from a sequence of numbers:

nums = range(1, 25)

primes = filter(lambda x: all(x%y != 0 for y in range(2, x)), nums)

print(list(primes))

Here, we constructed a lambda function that checks each number for primality and filter() gives only the primes.

These are just a few examples. filter() can be used in many other scenarios like processing datasets, analyzing logs etc.

Conclusion

The filter() function is an important built-in function in Python for filtering iterable data. This guide covers how to use filter() to filter lists, tuples, sets, dictionaries as well as custom objects based on different criteria.

Constructing the filter criteria using lambda functions allows for compact and flexible filtering. filter() can be combined with other functions like map() to process the filtered data further.

Some real-world use cases of filter() were also discussed like filtering database records, text analysis and finding prime numbers. Mastering filter() will make your Python data processing and analysis work easier and faster.

The key points from this guide are:

filter() takes a function and iterable as parameters
The function returns True or False to test if elements should be filtered
filter() returns filtered iterators, convert to lists or sets if required
Use lambdas for compact filter criteria
Filter works for lists, tuples, sets, dicts and custom classes
Combine with map(), sorted() etc for processing pipeline
Useful for database record filtering, text analysis, finding primes etc.

I hope this guide provided a comprehensive overview of how to filter data in Python using filter(). You should now be able to comfortably use this function for a wide range of applications.