Sets are a powerful built-in data structure in Python that allow you to store unique elements and utilize useful methods and operations. Mastering sets can help you write more efficient and elegant Python code to solve real-world problems. This comprehensive guide will provide practical coding exercises and examples to help you gain proficiency in leveraging sets and their properties in Python.
Table of Contents
Open Table of Contents
Introduction
A set is an unordered collection of unique elements in Python. Sets are mutable, meaning elements can be added or removed after creation. Sets implement the mathematical set theory concepts in Python.
Some key properties and advantages of Python sets:
- Sets contain only unique elements - no duplicates allowed
- Elements are unordered with no index attached
- Checking membership of an item is extremely fast - O(1) time complexity
- Supports common mathematical set operations like union, intersection, difference, etc.
- Can remove duplicates from a sequence and perform other useful operations
Let’s explore some practical real-world examples and exercises for utilizing sets to write cleaner and more Pythonic code.
Creating Sets
There are a few ways to initialize a set in Python:
# Initialize an empty set
empty_set = set()
# Initialize set with elements
languages = {'Python', 'R', 'Java'}
# Convert another data structure like a list or tuple to a set
set_from_list = set(['Python','Java','Ruby'])
Note that sets do not allow duplicate elements. Creating a set from a list or tuple automatically removes any duplicates.
duplicate_list = ['Python', 'Java', 'Python', 'Ruby']
set_from_list = set(duplicate_list)
print(set_from_list)
# Output: {'Python', 'Java', 'Ruby'}
Adding and Removing Elements
We can add a single element to a set using the add()
method:
languages.add('JavaScript')
print(languages)
# Output: {'Python', 'R', 'Java', 'JavaScript'}
To add multiple elements at once from an iterable, use the update()
method:
new_languages = ['C++', 'Go', 'Rust']
languages.update(new_languages)
print(languages)
# Output: {'R', 'Python', 'C++', 'Java', 'JavaScript', 'Go', 'Rust'}
Removing elements can be done via remove()
or discard()
:
languages.remove('R') # Raises KeyError if element not present
print(languages)
languages.discard('Swift') # Doesn't raise error if element not present
print(languages)
Set Operations
Some common set operations available in Python include:
Union - Returns new set containing all elements from both sets
A = {1, 2, 3, 4}
B = {3, 4, 5, 6}
print(A | B) # Union operator |
# Output: {1, 2, 3, 4, 5, 6}
print(A.union(B))
# Output: {1, 2, 3, 4, 5, 6}
Intersection - Returns set containing only common elements
print(A & B) # Intersection operator &
# Output: {3, 4}
print(A.intersection(B))
# Output: {3, 4}
Difference - Returns set difference between two sets
print(A - B) # Difference operator -
# Output: {1, 2}
print(B - A)
# Output: {5, 6}
print(A.difference(B))
# Output: {1, 2}
Symmetric Difference - Returns elements only in one set or the other, not both
print(A ^ B) # Symmetric difference operator ^
# Output: {1, 2, 5, 6}
print(A.symmetric_difference(B))
# Output: {1, 2, 5, 6}
Set Methods in Action
Let’s see some practical examples utilizing these set properties and methods to solve real-world problems:
Removing Duplicates
Sets can help remove duplicate elements from a sequence like a list efficiently:
duplicate_list = [1, 2, 3, 3, 4, 4, 4, 5, 6]
print(list(set(duplicate_list)))
# Output: [1, 2, 3, 4, 5, 6]
Membership Testing
We can use sets to quickly check if an element is contained in a sequence:
languages = ['Python', 'Java', 'Ruby', 'Python']
unique_langs = set(languages)
print('Python' in unique_langs) # True
print('C++' in unique_langs) # False
Membership testing is O(1) average time complexity compared to O(n) for lists.
Finding Common Elements
To find common elements across multiple sets or sequences:
english_speakers = {'United States', 'Canada', 'UK', 'Ireland', 'Australia'}
spanish_speakers = {'Mexico', 'Colombia', 'Spain', 'Chile', 'Peru'}
# Find countries that speak both languages
print(english_speakers & spanish_speakers)
# Output: {'Spain'}
# Find all English or Spanish speaking countries
print(english_speakers | spanish_speakers)
# Output: {'UK', 'Chile', 'United States', 'Spain', 'Australia',
# 'Peru', 'Canada', 'Colombia', 'Mexico', 'Ireland'}
Finding Unique Elements
To retrieve elements only present in one set or the other:
set_a = {1, 2, 3, 4}
set_b = {2, 3, 4, 5, 6}
print(set_a ^ set_b)
# Output: {1, 5, 6}
The symmetric difference shows elements unique to each set.
Maintaining Uniqueness
Adding new elements to a set ignores duplicates:
items = {'apple', 'banana', 'orange'}
items.add('apple')
items.add('strawberry')
print(items)
# Output: {'banana', 'strawberry', 'orange', 'apple'}
This provides an easy way to maintain uniqueness as we build up a set.
Set Comprehensions
Similar to list and dict comprehensions, we can also create sets using set comprehensions:
# Set comprehension
languages = {language for language in ['Python','Java','Python','C++','Ruby']}
print(languages)
# Output: {'Ruby', 'Java', 'Python', 'C++'}
This allows quickly initializing a set by generating elements based on any iterable source.
Practical Exercise - Analyzing Text
Let’s put together these set concepts into a practical example. We will write a function to analyze a text document and return statistics about:
- The number of unique words
- The 10 most frequently occurring words
- Words only found in the first half of the document
- Words only found in the second half
We can utilize sets to help us efficiently extract this information:
import re
from collections import Counter
def analyze_text(text):
# Split text into words, convert to lowercase
words = re.findall(r'\w+', text.lower())
total_words = len(words) # Count total words
# Extract unique words
unique_words = set(words)
num_unique = len(unique_words)
# Count word frequencies
word_counts = Counter(words)
top_10 = word_counts.most_common(10)
# Split in half
middle_index = len(words) // 2
first_half = set(words[:middle_index])
second_half = set(words[middle_index:])
# Words only in first half
first_half_unique = first_half - second_half
# Words only in second half
second_half_unique = second_half - first_half
return {
'total_words': total_words,
'num_unique': num_unique,
'freq_words': top_10,
'first_half_unique': first_half_unique,
'second_half_unique': second_half_unique
}
text = """Python is an interpreted, high-level, general-purpose programming
language. Created by Guido van Rossum and first released in 1991, Python's
design philosophy emphasizes code readability with its notable use of
significant whitespace. Its language constructs and object-oriented
approach aim to help programmers write clear, logical code for small and
large-scale projects. Python is dynamically typed and garbage-collected.
It supports multiple programming paradigms, including structured,
object-oriented and functional programming. Python is often described as
a "batteries included" language due to its comprehensive standard library.
Guido van Rossum began working on Python in the late 1980s as a successor
to the ABC programming language and first released it in 1991 as Python 0.9.0.
Python 2.0 was released in 2000 and introduced new features such as list
comprehensions, cycle-detecting garbage collection, reference counting, and
Unicode support. Python 3.0 was released in 2008 and was a major revision of
the language that is not completely backward-compatible. Python 2 was
discontinued with version 2.7.18 in 2020.
Python consistently ranks as one of the most popular programming languages."""
results = analyze_text(text)
print(results)
This provides useful analytics on the text by utilizing sets, counters, and other techniques. The key takeaways are:
- Converting words to a set removes duplicates
- Sets enable fast membership testing to find unique words
- Set operations like difference help find words only in one half
- Counters provide word frequency analysis
This is just one simple example of how sets can be used for text analytics. The methods can be extended further to gather other insights as well.
Summary
Sets are a builtin Python data structure that enable you to store unique elements. Sets are highly optimized for operations like membership testing, duplicate removal, and common set math operations like unions, intersections and differences.
Mastering Python sets allows writing faster, more Pythonic code to analyze data and solve complex coding problems. This guide covered practical examples and exercises illustrating common patterns and use cases for sets. Some key takeaways:
- Initialize sets from lists and tuples to remove duplicates
- Utilize set methods like
add()
,remove()
andupdate()
to modify sets - Perform set operations like unions, intersections and differences
- Use sets for membership testing, finding common or distinct elements
- Maintain uniqueness by adding elements to sets
- Create sets comprehensions just like lists and dicts
- Apply sets to solve real-world problems like text analysis
With these fundamentals, you should feel more confident applying sets in your own Python programming. Sets empower you to write simpler code optimized for efficiency and performance.