Skip to content

Implementing a Trie Data Structure in Python

Updated: at 04:45 AM

A trie, also known as a prefix tree, is a useful data structure for storing strings and allowing efficient operations on them. In this comprehensive guide, we will learn how to implement a trie in Python, including key operations like insertion, search, and deletion.

Table of Contents

Open Table of Contents

Introduction

A trie is a tree-like data structure where each node represents a character of an alphabet. By structuring the nodes in a particular way, words and strings can be retrieved from the structure by traversing down a branch path of the tree.

Tries are very useful for storing strings efficiently while allowing quick lookups and inserts. The tradeoff is that they can consume more memory relative to other data structures.

Some key properties and components of a trie:

The Trie data structure has many practical applications in real world scenarios:

In this guide, we will implement the following essential operations of a trie in Python:

Implementing the Trie Node

The building blocks of our trie will be a TrieNode class representing each node:

class TrieNode:
    def __init__(self):
        self.children = {}
        self.end_of_word = False

Each node contains a children dictionary to map its next potential characters to child nodes.

A TrieNode instance also tracks whether it represents the end of a word with the end_of_word property. This will be useful for search and delete operations later.

Inserting Words into the Trie

To insert a word into our trie, we traverse through the trie nodes character-by-character, adding nodes if needed.

The add_word method handles inserting a word into the trie:

def add_word(self, word):
    current = self.root
    for char in word:
        if char not in current.children:
            current.children[char] = TrieNode()
        current = current.children[char]
    current.end_of_word = True

This method loops through each character in the word we want to insert. At each step, it checks if the next child node for that character exists in the current node’s children map.

If the child node does not exist yet, a new TrieNode is created for that character. This effectively expands the trie to allow insertion of the new word.

The current node then progresses to this new child node, and the loop continues with the next character.

Finally, after iterating through all characters, we mark the last node as an end of word by setting end_of_word = True. This will help us distinguish complete words later during searches.

Here is an example usage:

word_trie = Trie()

word_trie.add_word("car")
word_trie.add_word("card")
word_trie.add_word("chip")
word_trie.add_word("trie")
word_trie.add_word("try")

This will construct the following trie structure:

         root
        /  \
       c    t
      / \   \
     a   h   r
    / \   \   \
   r   i    y   i
  /   \        /
 d     p      e

We can see how each prefix creates new nodes, while branches converge at shared prefixes.

Checking for Word Existence

To check if a word exists in our trie, we traverse nodes similarly to how words are inserted.

The key difference is that we return False immediately if:

Here is an exists method implementing this:

def exists(self, word):
    current = self.root
    for char in word:
        if char not in current.children:
            return False
        current = current.children[char]

    return current.end_of_word

We traverse through the characters of the word to check, grabbing child nodes from the current node’s children map.

If a child node does not exist for the next character, then we know the word cannot exist in the trie either. We immediately return False in this case.

Otherwise, we eventually reach the end of the word. At this point, we must check specifically for the end_of_word flag to confirm a complete word was inserted. If it’s False, then it was just a prefix.

For example:

print(word_trie.exists("car")) # True
print(word_trie.exists("card")) # True
print(word_trie.exists("ca")) # False, just a prefix
print(word_trie.exists("cat")) # False, word does not exist

The search functionality allows quick lookups for existence of words and their prefixes in the trie.

Removing Words from the Trie

We can also remove words from the trie. This involves a few key steps:

  1. Traverse to the end node of the word to delete.
  2. Set the end node’s end_of_word flag to False.
  3. Remove any useless nodes left behind.

Here is a remove method to handle deletion:

def remove(self, word):
    current = self.root
    for char in word:
        if char not in current.children:
            raise ValueError("Word not found")
        current = current.children[char]

    current.end_of_word = False

    # Remove any useless nodes left behind
    if current.children == {}:
        parent = self.__find_parent(current)
        del parent.children[char]

Similar to search, we first traverse to the end node of the word, tracking the parent node as well.

Once at the end node, we set end_of_word = False to remove the complete word indicator.

Finally, we check if deleting this word led to any child nodes becoming useless, i.e. empty dictionaries. If so, we recursively delete these nodes as well by removing them from the parent’s children map.

This cleanup step is important for freeing unused memory in the trie.

For example, deleting “card” from our previous trie:

           root
          /   \
         c     t
        / \    \
       a   h    r
      /        \
     r          y
    /

The ‘d’ node is removed since it is no longer useful after deleting “card”.

Trie Analysis

The trie provides efficient insert, search, and delete operations compared to other data structures.

Time Complexity

Space Complexity

Some key tradeoffs:

Overall, tries excel at fast search and prefix lookup operations at the cost of higher memory usage.

Trie Applications

Some example applications that benefit from using a trie:

Autocomplete/Predictive Text

As the user types each character, we can traverse the trie matching prefixes to offer suggestions.

Spell Checker

Quickly check text against a dictionary trie to identify spelling mistakes.

IP Routing

Store routing table entries by prefixes for efficient longest-prefix matching.

Keyword Search

Allow searching for words without needing full word matching.

Storing Dictionaries

Efficiently store dictionaries of words in multiple languages.

Conclusion

In this guide, we implemented a trie data structure in Python including key operations like insert, search, and delete.

The prefix tree structure of a trie enables efficient operations on string datasets while optimizing for searches by prefixes.

Tries have many practical use cases and applications ranging from autocomplete to IP routing. The trie is an essential data structure for handling string data.

Here are some additional ways to expand on the basic trie implementation:

I hope this guide provided a comprehensive overview of implementing tries in Python! Let me know if you have any other questions.