Skip to content

How to Check if Two Strings are Anagrams in Python

Updated: at 04:34 AM

An anagram is a word or phrase formed by rearranging the letters of another word or phrase. For example, “listen” and “silent” are anagrams because they contain the same letters. Anagram checking is a common technical interview question used to assess a candidate’s programming skills and problem-solving abilities in Python.

In this comprehensive guide, we will examine multiple methods to check if two strings are anagrams in Python. We will look at basic, intermediate, and advanced approaches, analyzing the performance and complexity tradeoffs of each technique. Practical code examples are provided to illustrate the key concepts and techniques.

Overview of the Anagram Checking Problem

Here is a formal definition of the anagram checking problem:

Given two strings, check if they are anagrams of each other. Return True if the strings contain the same characters regardless of order, and False otherwise.

For example:

is_anagram("silent", "listen") ➞ True

is_anagram("act", "cat") ➞ True

is_anagram("aaba", "aabb") ➞ False

This seems like a simple problem at first glance, but there are some edge cases and optimizations to consider that can impact your solution.

Requirements

Challenges

In the following sections, we will explore various techniques to account for these requirements and challenges while checking for anagrams in Python.

Basic Approaches

Here are two basic approaches to check for anagram strings in Python:

Sort and Compare

This algorithm sorts each string and compares the sorted versions:

def is_anagram_sort(str1, str2):
  str1 = str1.lower().replace(" ", "")
  str2 = str2.lower().replace(" ", "")

  return sorted(str1) == sorted(str2)

Character Count

Count character frequencies using a dictionary:

def is_anagram_count(str1, str2):
  str1 = str1.lower().replace(" ", "")
  str2 = str2.lower().replace(" ", "")

  if len(str1) != len(str2):
    return False

  char_count = {}

  for char in str1:
    char_count[char] = char_count.get(char, 0) + 1

  for char in str2:
    if char not in char_count:
      return False
    char_count[char] -= 1

  return all(count == 0 for count in char_count.values())

These basic approaches demonstrate the core logic required for anagram checking in Python. However, they do not handle duplicate characters efficiently and can be optimized further.

Intermediate Solutions

The intermediate solutions below improve upon the basic approaches:

Hash and Compare

Use Python hash values instead of sorting:

def is_anagram_hash(str1, str2):
  str1 = str1.lower().replace(" ", "")
  str2 = str2.lower().replace(" ", "")

  return hash(str1) == hash(str2)

Note: Hash collisions are possible so this is not 100% accurate. Sorting avoids collisions but takes longer.

Character Counts with DefaultDict

Use defaultdict instead of checking if keys exist:

from collections import defaultdict

def is_anagram_dd(str1, str2):
  str1, str2 = str1.lower(), str2.lower()
  str1, str2 = str1.replace(" ", ""), str2.replace(" ", "")

  if len(str1) != len(str2):
      return False

  dd = defaultdict(int)

  for char in str1:
      dd[char] += 1

  for char in str2:
      dd[char] -= 1
      if dd[char] < 0:
          return False

  return all(count == 0 for count in dd.values())

These intermediate solutions provide more optimized anagram checks in Python without too much extra coding. Next, we’ll explore some advanced approaches.

Advanced Techniques

Here are some more advanced solutions that leverage primes, sets, counters, and memoization to optimize anagram detection:

Prime Multiplication

Multiply prime numbers corresponding to character counts:

import math

PRIMES = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101]

def is_anagram_prime(str1, str2):
  str1, str2 = str1.lower(), str2.lower()
  str1, str2 = str1.replace(" ", ""), str2.replace(" ", "")

  if len(str1) != len(str2):
    return False

  primes = [1] * len(PRIMES)

  for char in str1:
    index = ord(char) - ord('a')
    primes[index] *= PRIMES[index]

  for char in str2:
    index = ord(char) - ord('a')
    primes[index] /= PRIMES[index]

  for prime in primes:
    if prime != 1:
      return False

  return True

This avoids collisions and duplicate character inefficiencies.

Set Intersection

Convert strings to sets and check for intersection:

def is_anagram_set(str1, str2):
  return set(str1.lower().replace(" ", "")) == set(str2.lower().replace(" ", ""))

Set theory provides a simple way to handle duplicates.

Counter Comparison

Use Counter objects to store and compare char counts:

from collections import Counter

def is_anagram_counter(str1, str2):
  str1, str2 = str1.lower(), str2.lower()
  str1, str2 = str1.replace(" ", ""), str2.replace(" ", "")

  if len(str1) != len(str2):
    return False

  return Counter(str1) == Counter(str2)

Counter handles duplicates efficiently behind the scenes.

Memoization

Use memoization to cache results for repeated function calls:

from functools import lru_cache

@lru_cache(maxsize=1000)
def is_anagram_memo(str1, str2):
  # Existing is_anagram function call

This provides significant speedups for repeated function calls in production.

Comparative Analysis

Here is a summary of the time and space complexity for the various techniques:

ApproachTime ComplexitySpace Complexity
Sort and CompareO(nlogn)O(n)
Character CountO(n)O(n)
Hash and CompareO(n)O(1)
DefaultDict CountO(n)O(n)
Prime MultiplicationO(n)O(1)
Set IntersectionO(n)O(n)
Counter ComparisonO(n)O(n)
MemoizationO(n)O(n)

So in summary, the hash, prime, set, counter, and memoization solutions provide the best time and space complexity. Hashing is fastest but can have collisions. Primes avoid collisions at the cost of precomputations. Overall, counters provide the best balance of efficiency, accuracy, and simplicity.

Handling Edge Cases

Here are some edge cases to consider:

Properly handling edge cases improves the robustness and correctness of your solution.

Putting it All Together

Here is an example solution using the counter approach with edge case handling:

from collections import Counter

def is_anagram(str1, str2):
  if len(str1) != len(str2):
    return False

  str1, str2 = str1.lower(), str2.lower()
  str1, str2 = str1.replace(" ", ""), str2.replace(" ", "")

  return Counter(str1) == Counter(str2)

This provides a robust anagram checking function in just 5 lines of Python!

Testing the Solution

Here are some examples to test the anagram checking function:

import unittest

class TestAnagram(unittest.TestCase):

  def test_valid_anagrams(self):
    self.assertTrue(is_anagram('silent', 'listen'))
    self.assertTrue(is_anagram('Triangle', 'integral'))

  def test_invalid_anagrams(self):
    self.assertFalse(is_anagram('cat', 'dog'))
    self.assertFalse(is_anagram('State', 'tastes'))

  def test_edge_cases(self):
    self.assertFalse(is_anagram('', 'hello'))
    self.assertFalse(is_anagram('hello', 'helloo'))

if __name__ == '__main__':
  unittest.main()

Thorough testing improves correctness and prevents regressions.

Applications

Checking for anagram strings has various applications:

Anagram algorithms serve as useful string processing utilities for many programs.

Conclusion

In this comprehensive guide, we explored numerous techniques to check if two strings are anagrams in Python:

The counter-based solution provides the best combination of simplicity, efficiency, and accuracy for most use cases.

Properly handling edge cases and exceptions improves robustness. Unit testing validates correctness.

Anagram checking is a common interview coding challenge that assesses a candidate’s skills in algorithms, string manipulation, time/space complexity analysis, and test-driven development with Python.

Mastering these techniques will prepare you for technical coding interviews and level up your overall programming abilities!