A trie, also known as a prefix tree, is a useful data structure for storing strings and allowing efficient operations on them. In this comprehensive guide, we will learn how to implement a trie in Python, including key operations like insertion, search, and deletion.
Table of Contents
Open Table of Contents
Introduction
A trie is a tree-like data structure where each node represents a character of an alphabet. By structuring the nodes in a particular way, words and strings can be retrieved from the structure by traversing down a branch path of the tree.
Tries are very useful for storing strings efficiently while allowing quick lookups and inserts. The tradeoff is that they can consume more memory relative to other data structures.
Some key properties and components of a trie:
- The root node does not represent any character. It instead links to different nodes representing the first character of words.
- Each node in the trie can have multiple child nodes, representing valid characters of words.
- Nodes deeper in the trie represent prefixes of words. The end of a word is marked by a special node property.
- Traversing down a path of connected nodes in the trie spells out a prefix or word.
The Trie data structure has many practical applications in real world scenarios:
- Autocomplete or predictive text
- Spell checker
- IP routing (matching routes by prefixes)
- Keyword search
- Storing dictionaries
In this guide, we will implement the following essential operations of a trie in Python:
- Insert - Adding words to the trie
- Search - Checking if a word exists in the trie
- Delete - Removing words from the trie
Implementing the Trie Node
The building blocks of our trie will be a TrieNode
class representing each node:
class TrieNode:
def __init__(self):
self.children = {}
self.end_of_word = False
Each node contains a children
dictionary to map its next potential characters to child nodes.
A TrieNode
instance also tracks whether it represents the end of a word with the end_of_word
property. This will be useful for search and delete operations later.
Inserting Words into the Trie
To insert a word into our trie, we traverse through the trie nodes character-by-character, adding nodes if needed.
The add_word
method handles inserting a word into the trie:
def add_word(self, word):
current = self.root
for char in word:
if char not in current.children:
current.children[char] = TrieNode()
current = current.children[char]
current.end_of_word = True
This method loops through each character in the word we want to insert. At each step, it checks if the next child node for that character exists in the current node’s children
map.
If the child node does not exist yet, a new TrieNode
is created for that character. This effectively expands the trie to allow insertion of the new word.
The current node then progresses to this new child node, and the loop continues with the next character.
Finally, after iterating through all characters, we mark the last node as an end of word by setting end_of_word = True
. This will help us distinguish complete words later during searches.
Here is an example usage:
word_trie = Trie()
word_trie.add_word("car")
word_trie.add_word("card")
word_trie.add_word("chip")
word_trie.add_word("trie")
word_trie.add_word("try")
This will construct the following trie structure:
root
/ \
c t
/ \ \
a h r
/ \ \ \
r i y i
/ \ /
d p e
We can see how each prefix creates new nodes, while branches converge at shared prefixes.
Checking for Word Existence
To check if a word exists in our trie, we traverse nodes similarly to how words are inserted.
The key difference is that we return False
immediately if:
- A child node for the next character does not exist.
- We reach the end of word without the
end_of_word
flag set.
Here is an exists
method implementing this:
def exists(self, word):
current = self.root
for char in word:
if char not in current.children:
return False
current = current.children[char]
return current.end_of_word
We traverse through the characters of the word to check, grabbing child nodes from the current node’s children
map.
If a child node does not exist for the next character, then we know the word cannot exist in the trie either. We immediately return False
in this case.
Otherwise, we eventually reach the end of the word. At this point, we must check specifically for the end_of_word
flag to confirm a complete word was inserted. If it’s False, then it was just a prefix.
For example:
print(word_trie.exists("car")) # True
print(word_trie.exists("card")) # True
print(word_trie.exists("ca")) # False, just a prefix
print(word_trie.exists("cat")) # False, word does not exist
The search functionality allows quick lookups for existence of words and their prefixes in the trie.
Removing Words from the Trie
We can also remove words from the trie. This involves a few key steps:
- Traverse to the end node of the word to delete.
- Set the end node’s
end_of_word
flag to False. - Remove any useless nodes left behind.
Here is a remove
method to handle deletion:
def remove(self, word):
current = self.root
for char in word:
if char not in current.children:
raise ValueError("Word not found")
current = current.children[char]
current.end_of_word = False
# Remove any useless nodes left behind
if current.children == {}:
parent = self.__find_parent(current)
del parent.children[char]
Similar to search, we first traverse to the end node of the word, tracking the parent node as well.
Once at the end node, we set end_of_word = False
to remove the complete word indicator.
Finally, we check if deleting this word led to any child nodes becoming useless, i.e. empty dictionaries. If so, we recursively delete these nodes as well by removing them from the parent’s children
map.
This cleanup step is important for freeing unused memory in the trie.
For example, deleting “card” from our previous trie:
root
/ \
c t
/ \ \
a h r
/ \
r y
/
The ‘d’ node is removed since it is no longer useful after deleting “card”.
Trie Analysis
The trie provides efficient insert, search, and delete operations compared to other data structures.
Time Complexity
-
Insert, search, and delete are all O(L) where L is the length of the word operated on. This is much faster than O(Log N) for binary search trees.
-
The operations do not depend on the total number of keys N in the trie.
Space Complexity
-
In the worst case, we would have N distinct words that share no common prefixes. This would use O(N * L) space to store the entire trie.
-
Typically, there are reused prefixes across words so trie space usage is less than O(N * L) on average.
Some key tradeoffs:
-
Tries require more memory than binary search trees since each node stores pointers to child nodes.
-
Lookup time is faster compared to balanced binary search trees or hash tables.
-
Prefix lookup is efficient compared to search trees.
-
Tries can be slower for insert and delete due to traversing multiple nodes and cleanup. Hash tables have O(1) insert and delete.
Overall, tries excel at fast search and prefix lookup operations at the cost of higher memory usage.
Trie Applications
Some example applications that benefit from using a trie:
Autocomplete/Predictive Text
As the user types each character, we can traverse the trie matching prefixes to offer suggestions.
Spell Checker
Quickly check text against a dictionary trie to identify spelling mistakes.
IP Routing
Store routing table entries by prefixes for efficient longest-prefix matching.
Keyword Search
Allow searching for words without needing full word matching.
Storing Dictionaries
Efficiently store dictionaries of words in multiple languages.
Conclusion
In this guide, we implemented a trie data structure in Python including key operations like insert, search, and delete.
The prefix tree structure of a trie enables efficient operations on string datasets while optimizing for searches by prefixes.
Tries have many practical use cases and applications ranging from autocomplete to IP routing. The trie is an essential data structure for handling string data.
Here are some additional ways to expand on the basic trie implementation:
- Handle wildcard searches efficiently
- Store values or metadata at trie nodes
- Implement compression schemes like LZW to reduce memory usage
- Support concurrent operations using threading locks
I hope this guide provided a comprehensive overview of implementing tries in Python! Let me know if you have any other questions.