Depth-first search (DFS) is an important algorithm for traversing tree or graph data structures. In technical interviews, candidates are often asked to implement a DFS algorithm to traverse a binary tree. This algorithm explores branches of a tree or graph by going as deep as possible down each branch before backtracking and moving on to the next branch. In this comprehensive guide, we will cover the DFS algorithm in detail with Python example code and step-by-step explanations to traverse a binary tree data structure.
Table of Contents
Open Table of Contents
Overview of Depth-First Search Algorithm
The depth-first search algorithm starts at the root node of a tree or graph and explores as far as possible along each branch before backtracking. So it goes deep first before going wide.
The DFS algorithm follows these main steps:
-
Start by placing any arbitrary node of the graph in a stack.
-
Take the top item of the stack and add it to the visited list.
-
Create a list of the adjacent nodes. Add the ones which aren’t in the visited list to the top of the stack.
-
Keep repeating steps 2 and 3 until the stack is empty.
The above pseudocode shows the general logic of a recursive DFS implementation. The key points are:
-
Use a stack to keep track of the next node to visit. New nodes are added to the top.
-
Mark nodes as “visited” when traversing to avoid cycles.
-
Recursively call the DFS function on each adjacent unvisited node.
Some key properties of depth-first search are:
-
It traverses a graph branch completely before moving to next branch.
-
DFS will not get stuck in infinite loops on graphs with cycles. The visited set avoids revisiting nodes.
-
All vertices may not be reachable from a given starting point, so DFS is often run multiple times from different starting nodes to fully traverse a graph.
-
DFS tends to be memory efficient as it doesn’t have to store the full graph explicitly like breadth-first search. Only the path needs to be stored in the call stack.
Now let’s look at implementing DFS in Python specifically to traverse a binary tree data structure using code examples.
Binary Tree Data Structure in Python
A binary tree is a tree data structure where each node has at most two child nodes referred to as the left child and right child. In Python, we can create a Node class to represent each node in the tree:
class Node:
def __init__(self, val=None):
self.val = val
self.left = None
self.right = None
Here the Node
class has a val
to store node value and left
and right
attributes to represent left and right child nodes.
We can create a sample binary tree like:
root = Node('A')
root.left = Node('B')
root.right = Node('C')
root.left.left = Node('D')
This forms the sample binary tree structure:
A
/ \
B C
/
D
Note that in the diagrams, nodes are shown as ovals with the node value inside. Edges point from parent to child nodes.
Now that we have our binary tree data structure in Python, next we can implement the depth-first search algorithm to traverse it.
Recursive Depth-First Search in Python
The recursive DFS approach is elegant and simple to implement in Python. We will define a traverse()
function that takes a root node as input and recursively calls itself on the left and right child nodes:
def traverse(node):
if node is None:
return
print(node.val)
traverse(node.left)
traverse(node.right)
Breaking this down:
-
Base case is if the node is
None
, then return back. -
Otherwise, print the node value for pre-order traversal.
-
Recursively call
traverse()
on the left and right children.
To kick off the DFS, we need to initially call it on the root node:
root = Node('A')
# Set up tree structure ...
traverse(root)
The full program would be:
class Node:
def __init__(self,val=None):
self.val = val
self.left = None
self.right = None
def traverse(node):
if node is None:
return
print(node.val)
traverse(node.left)
traverse(node.right)
root = Node('A')
root.left = Node('B')
root.right = Node('C')
root.left.left = Node('D')
traverse(root)
Output:
A
B
D
C
This recursively traverses the tree depth-first in pre-order (parent node visited before children).
The key advantage of the recursive approach is the code is very clean and simple. The call stack handles the node visits implicitly.
However, for very deep trees the recursive stack can get large, leading to stack overflow issues. So next we’ll look at an iterative version.
Iterative Depth-First Search in Python
We can implement the DFS algorithm iteratively using a stack. The logic will be:
-
Initialize stack with root node.
-
While stack is not empty:
-
Pop a node from stack and print it.
-
Push the right child of node to stack.
-
Push the left child of node to stack.
-
This pushes nodes in the order we want to visit them.
Here is the Python code:
class Node:
# Node class definition
def traverse(root):
if root is None:
return
stack = []
stack.append(root)
while stack:
node = stack.pop()
print(node.val)
if node.right:
stack.append(node.right)
if node.left:
stack.append(node.left)
Key points:
-
Check for empty root.
-
Initialize a stack with the root node.
-
Loop while stack is not empty.
-
Pop a node and print it.
-
Push the left and right children into the stack.
-
Next iteration will pop and visit the recently pushed nodes.
This iteratively traverses the tree depth-first, but in reverse pre-order (children visited before parent) since we push children before visiting the parent node.
The iterative DFS avoids call stack limits for deep trees and can be faster for large graphs. But the code is more complex than the recursive DFS.
Comparing Depth-First Search Variants
There are a few common variants of the depth-first search algorithm:
-
Pre-order DFS: Process node first, then recursively visit children. The recursive code earlier shows pre-order traversal.
-
Post-order DFS: Recursively visit children first, then process node. To implement, move the print(node.val) call after the recursive calls.
-
Reverse post-order DFS: Iterative stack-based DFS processes children before parents, resulting in reverse post-order traversal.
The order determines when each node is processed relative to its children. Different orders are useful in various situations:
-
Pre-order: Useful for recursive processing of nodes such as cloning, copying, or serialization.
-
Post-order: Computes or outputs something after visiting children so requires full subtree info. Useful for expression evaluation or critical path analysis.
-
Reverse post-order: Processes children nodes first so can recurse on children in the natural order. Used in applications like compilers.
So in summary, recursive DFS provides a clean implementation, while iterative DFS avoids call stack limits. And several traversal orders are possible depending on the application’s requirements.
Applications of Depth-First Search
Some common applications of the DFS algorithm include:
-
Traversing tree or graph structures: DFS is often used to visit all nodes/vertices in a tree or connected graph, as seen in the examples above.
-
Topological sorting: DFS can output vertices of a directed acyclic graph in topological order.
-
Finding connected components: DFS can identify all nodes reachable from a starting node, so multiple DFS runs can label connected components of a graph.
-
Path finding: DFS can be adapted to find a path between two nodes by tracking predecessors during traversal.
-
Cycle detection: A variant of DFS can detect cycles in a graph by checking for back edges to already visited nodes.
-
Maze exploration: DFS can fully explore all routes in a maze represented as a graph. The first solution found is guaranteed to be optimal.
-
Exhaustively enumerating solutions: DFS can systematically explore all possible solutions/permutations to a problem by traversing a tree of options.
DFS is thus a fundamental algorithm for working with tree and graph data structures with applications in compilers, AI, web crawlers, video games, and more.
Depth-First Search Python Implementation Summary
To summarize, here are the key points covered about implementing depth-first search in Python:
-
DFS starts at root node and explores as far as possible along each branch before backtracking.
-
Use a stack to track nodes to visit next. Push deeper nodes first to defer wider traversal.
-
Mark visited nodes to avoid cycles.
-
Recursive DFS is simple but can hit call stack limits.
-
Iterative DFS uses a stack and loop which avoids recursion overhead.
-
Different orders like pre-order, post-order, reverse post-order are possible. Order depends on application needs.
-
DFS is used to traverse trees/graphs, find paths and cycles, topological sorting, solve mazes, and in many other applications.
So in coding interviews, be sure to have a strong understanding of how to implement recursive and iterative depth-first search in Python. Both approaches have tradeoffs to consider for traversing binary trees and other data structures.