Skip to content

Practical Exercises: Solving Problems Using Sets and Their Unique Element Properties in Python

Updated: at 04:23 AM

Sets are a powerful built-in data structure in Python that allow you to store unique elements and utilize useful methods and operations. Mastering sets can help you write more efficient and elegant Python code to solve real-world problems. This comprehensive guide will provide practical coding exercises and examples to help you gain proficiency in leveraging sets and their properties in Python.

Table of Contents

Open Table of Contents

Introduction

A set is an unordered collection of unique elements in Python. Sets are mutable, meaning elements can be added or removed after creation. Sets implement the mathematical set theory concepts in Python.

Some key properties and advantages of Python sets:

Let’s explore some practical real-world examples and exercises for utilizing sets to write cleaner and more Pythonic code.

Creating Sets

There are a few ways to initialize a set in Python:

# Initialize an empty set
empty_set = set()

# Initialize set with elements
languages = {'Python', 'R', 'Java'}

# Convert another data structure like a list or tuple to a set
set_from_list = set(['Python','Java','Ruby'])

Note that sets do not allow duplicate elements. Creating a set from a list or tuple automatically removes any duplicates.

duplicate_list = ['Python', 'Java', 'Python', 'Ruby']
set_from_list = set(duplicate_list)
print(set_from_list)

# Output: {'Python', 'Java', 'Ruby'}

Adding and Removing Elements

We can add a single element to a set using the add() method:

languages.add('JavaScript')

print(languages)
# Output: {'Python', 'R', 'Java', 'JavaScript'}

To add multiple elements at once from an iterable, use the update() method:

new_languages = ['C++', 'Go', 'Rust']
languages.update(new_languages)

print(languages)
# Output: {'R', 'Python', 'C++', 'Java', 'JavaScript', 'Go', 'Rust'}

Removing elements can be done via remove() or discard():

languages.remove('R') # Raises KeyError if element not present
print(languages)

languages.discard('Swift') # Doesn't raise error if element not present
print(languages)

Set Operations

Some common set operations available in Python include:

Union - Returns new set containing all elements from both sets

A = {1, 2, 3, 4}
B = {3, 4, 5, 6}

print(A | B) # Union operator |
# Output: {1, 2, 3, 4, 5, 6}

print(A.union(B))
# Output: {1, 2, 3, 4, 5, 6}

Intersection - Returns set containing only common elements

print(A & B) # Intersection operator &
# Output: {3, 4}

print(A.intersection(B))
# Output: {3, 4}

Difference - Returns set difference between two sets

print(A - B) # Difference operator -
# Output: {1, 2}

print(B - A)
# Output: {5, 6}

print(A.difference(B))
# Output: {1, 2}

Symmetric Difference - Returns elements only in one set or the other, not both

print(A ^ B) # Symmetric difference operator ^

# Output: {1, 2, 5, 6}

print(A.symmetric_difference(B))
# Output: {1, 2, 5, 6}

Set Methods in Action

Let’s see some practical examples utilizing these set properties and methods to solve real-world problems:

Removing Duplicates

Sets can help remove duplicate elements from a sequence like a list efficiently:

duplicate_list = [1, 2, 3, 3, 4, 4, 4, 5, 6]

print(list(set(duplicate_list)))
# Output: [1, 2, 3, 4, 5, 6]

Membership Testing

We can use sets to quickly check if an element is contained in a sequence:

languages = ['Python', 'Java', 'Ruby', 'Python']

unique_langs = set(languages)

print('Python' in unique_langs) # True
print('C++' in unique_langs) # False

Membership testing is O(1) average time complexity compared to O(n) for lists.

Finding Common Elements

To find common elements across multiple sets or sequences:

english_speakers = {'United States', 'Canada', 'UK', 'Ireland', 'Australia'}
spanish_speakers = {'Mexico', 'Colombia', 'Spain', 'Chile', 'Peru'}

# Find countries that speak both languages
print(english_speakers & spanish_speakers)
# Output: {'Spain'}

# Find all English or Spanish speaking countries
print(english_speakers | spanish_speakers)
# Output: {'UK', 'Chile', 'United States', 'Spain', 'Australia',
#          'Peru', 'Canada', 'Colombia', 'Mexico', 'Ireland'}

Finding Unique Elements

To retrieve elements only present in one set or the other:

set_a = {1, 2, 3, 4}
set_b = {2, 3, 4, 5, 6}

print(set_a ^ set_b)
# Output: {1, 5, 6}

The symmetric difference shows elements unique to each set.

Maintaining Uniqueness

Adding new elements to a set ignores duplicates:

items = {'apple', 'banana', 'orange'}

items.add('apple')
items.add('strawberry')

print(items)
# Output: {'banana', 'strawberry', 'orange', 'apple'}

This provides an easy way to maintain uniqueness as we build up a set.

Set Comprehensions

Similar to list and dict comprehensions, we can also create sets using set comprehensions:

# Set comprehension
languages = {language for language in ['Python','Java','Python','C++','Ruby']}

print(languages)
# Output: {'Ruby', 'Java', 'Python', 'C++'}

This allows quickly initializing a set by generating elements based on any iterable source.

Practical Exercise - Analyzing Text

Let’s put together these set concepts into a practical example. We will write a function to analyze a text document and return statistics about:

  1. The number of unique words
  2. The 10 most frequently occurring words
  3. Words only found in the first half of the document
  4. Words only found in the second half

We can utilize sets to help us efficiently extract this information:

import re
from collections import Counter

def analyze_text(text):

  # Split text into words, convert to lowercase
  words = re.findall(r'\w+', text.lower())

  total_words = len(words) # Count total words

  # Extract unique words
  unique_words = set(words)
  num_unique = len(unique_words)

  # Count word frequencies
  word_counts = Counter(words)
  top_10 = word_counts.most_common(10)

  # Split in half
  middle_index = len(words) // 2
  first_half = set(words[:middle_index])
  second_half = set(words[middle_index:])

  # Words only in first half
  first_half_unique = first_half - second_half

  # Words only in second half
  second_half_unique = second_half - first_half

  return {
    'total_words': total_words,
    'num_unique': num_unique,
    'freq_words': top_10,
    'first_half_unique': first_half_unique,
    'second_half_unique': second_half_unique
  }

text = """Python is an interpreted, high-level, general-purpose programming
language. Created by Guido van Rossum and first released in 1991, Python's
design philosophy emphasizes code readability with its notable use of
significant whitespace. Its language constructs and object-oriented
approach aim to help programmers write clear, logical code for small and
large-scale projects. Python is dynamically typed and garbage-collected.
It supports multiple programming paradigms, including structured,
object-oriented and functional programming. Python is often described as
a "batteries included" language due to its comprehensive standard library.

Guido van Rossum began working on Python in the late 1980s as a successor
to the ABC programming language and first released it in 1991 as Python 0.9.0.
Python 2.0 was released in 2000 and introduced new features such as list
comprehensions, cycle-detecting garbage collection, reference counting, and
Unicode support. Python 3.0 was released in 2008 and was a major revision of
the language that is not completely backward-compatible. Python 2 was
discontinued with version 2.7.18 in 2020.

Python consistently ranks as one of the most popular programming languages."""

results = analyze_text(text)

print(results)

This provides useful analytics on the text by utilizing sets, counters, and other techniques. The key takeaways are:

This is just one simple example of how sets can be used for text analytics. The methods can be extended further to gather other insights as well.

Summary

Sets are a builtin Python data structure that enable you to store unique elements. Sets are highly optimized for operations like membership testing, duplicate removal, and common set math operations like unions, intersections and differences.

Mastering Python sets allows writing faster, more Pythonic code to analyze data and solve complex coding problems. This guide covered practical examples and exercises illustrating common patterns and use cases for sets. Some key takeaways:

With these fundamentals, you should feel more confident applying sets in your own Python programming. Sets empower you to write simpler code optimized for efficiency and performance.