Skip to content

Implementing Topological Sort in Python

Updated: at 04:45 AM

Topological sorting is an algorithm for arranging the nodes of a directed acyclic graph (DAG) in a special linear order where for each directed edge from node A to node B, node A appears before node B in the ordering. The topological sort order is not necessarily unique, there can be multiple valid topological orderings for a given DAG.

Topological sorting has many applications in scheduling, data science, computer science, and other fields. Some key use cases include:

In this comprehensive Python guide, we will cover the following topics:

Table of Contents

Open Table of Contents

Understanding Directed Acyclic Graphs

A Directed Acyclic Graph (DAG) is a directed graph that contains no cycles. That means it is not possible to start from a node and follow a sequence of edges that loops back to the same node again.

Some key properties of DAGs:

For a DAG, a topological ordering or sort is possible such that for every directed edge from node A to node B, node A appears before node B in the ordering.

Here is an example DAG:

# Example DAG

graph = {
  'A': ['D'],
  'B': ['D'],
  'C': ['A', 'B'],
  'D': ['G', 'H'],
  'E': ['F'],
  'F': ['H'],
  'G': [],
  'H': []
}

This DAG has 8 nodes (A to H). The edges indicate dependencies, for example, node C depends on nodes A and B. Node D depends on A, B, and C indirectly. G and H are ending nodes with no further dependencies.

There can be different valid topological orderings for this graph, such as:

[G, H, D, A, B, C, E, F]

or

[H, G, D, B, A, C, F, E]

But a node will always appear after its dependencies in a valid topological ordering.

Topological Sort Algorithms

There are two common algorithms to find a topological ordering for a DAG:

Kahn’s Algorithm using Indegree

Kahn’s algorithm works by calculating the indegree for each node, then starts with nodes that have an indegree of 0, and removes their edges one by one.

The steps are:

  1. Initialize a queue Q to store nodes with indegree 0
  2. Calculate indegree for each node and add nodes with indegree 0 to Q
  3. While Q is not empty:
    • Remove a node n from Q
    • Decrease indegree for all its neighboring nodes
    • Add any new nodes with indegree 0 to Q
  4. If graph has edges left, it has a cycle (not DAG)
  5. Else print topological sort order using the removed nodes

Depth First Search (DFS) Approach

The DFS approach does a depth-first traversal of the graph, tracking visited nodes and prioritizing nodes that have dependencies.

The steps are:

  1. Initialize empty list for topological order
  2. Perform DFS traversal:
    • Mark node as visited
    • Recursively visit its dependencies that are unvisited
    • After recursive calls, add node to topological order
  3. Print topological order list after DFS

DFS ensures that a node only gets added after its dependencies have been added first. This produces the topological ordering.

Next, let’s look at Python code to implement these algorithms.

Implementing Topological Sort in Python

We will implement both Kahn’s algorithm and the DFS approach in Python to find topological order.

Let’s use the following sample DAG for testing the implementations:

# Sample DAG
graph = {
  'A': set(['D', 'F']),
  'B': set(['D', 'E']),
  'C': set(['D', 'F']),
  'D': set(['H']),
  'E': set(['F', 'H']),
  'F': set([]),
  'H': set([])
}

Kahn’s Algorithm

Here is the Python implementation of Kahn’s algorithm using indegree:

from collections import defaultdict

def topological_sort_kahn(graph):
    indegree = defaultdict(int) # Track indegrees
    queue = [] #Initialize queue

    # Calculate indegrees
    for node in graph:
        for neighbour in graph[node]:
            indegree[neighbour] += 1

    # Add nodes with 0 indegree to queue
    for node in graph:
        if indegree[node] == 0:
            queue.append(node)

    topological_order = []

    # Process until queue is empty
    while queue:

        # Remove node from queue and add to topological order
        node = queue.pop(0)
        topological_order.append(node)

        # Reduce indegree for its neighbors
        for neighbour in graph[node]:
            indegree[neighbour] -= 1

            # Add new 0 indegree nodes to queue
            if indegree[neighbour] == 0:
                queue.append(neighbour)

    if len(topological_order) != len(graph):
        print("Cycle exists")

    print(topological_order)

topological_sort_kahn(graph)

Output:

['H', 'F', 'E', 'D', 'A', 'B', 'C']

This follows Kahn’s algorithm steps:

  1. Track indegrees of each node
  2. Add 0 indegree nodes to queue (H, F)
  3. Reduce indegree on removal:
    • H removed - E now has 0 indegree
    • F removed - E added to queue
    • E removed - D has 0 indegree
    • etc.
  4. topological_order contains valid order meeting DAG dependencies

Depth First Search Approach

Here is the Python DFS implementation for topological sorting:

visited = set() # Track visited nodes
topological_order = [] # Store topological order

def topological_sort_dfs(graph, node):

    # Mark node as visited
    visited.add(node)

    # Recursively call DFS on its neighbors
    for neighbor in graph[node]:
       if neighbor not in visited:
            topological_sort_dfs(graph, neighbor)

     # Add node to topological order
    topological_order.append(node)

for node in graph:
    if node not in visited:
        topological_sort_dfs(graph, node)

print(topological_order)

Output:

['H', 'F', 'D', 'E', 'B', 'A', 'C']

This follows the DFS steps:

  1. Mark node as visited and recursively visit unvisited dependencies
  2. After recursive calls end, add node to topological order
  3. Only add node after its dependencies have been added first

The DFS approach directly enforces the topological ordering based on graph structure.

Applications and Examples

Topological sort has many applications in real-world systems. Some examples:

Instruction Scheduling:

Package Management:

Data Science Pipelines:

Network Routing:

Project Scheduling:

Let’s look at a project scheduling example.

Consider a project with the following task dependencies:

A -> C
B -> C
C -> D, E
D -> F
E -> F
F -> G

We can model this as a DAG:

project_graph = {
  'A': ['C'],
  'B': ['C'],
  'C': ['D','E'],
  'D': ['F'],
  'E': ['F'],
  'F': ['G'],
  'G': []
}

Run topological sort on this graph to get a valid task schedule order:

Topological order: ['A', 'B', 'C', 'D', 'E', 'F', 'G']

This schedule meets all the task dependencies.

Complexity Analysis

The complexity of Kahn’s algorithm is O(V+E) where V is number of nodes and E is number of edges.

This is because it does one O(V) sweep to calculate indegrees, and O(E) work to reduce indegrees of neighbors.

The DFS approach is also O(V+E) since it visits all nodes and edges in the recursion.

Overall, both algorithms are efficient and linear time complexity for sparse graphs.

Common Errors and Debugging

Here are some common errors and how to debug them when implementing topological sort:

Carefully testing the algorithms on different sample DAGs and validating the output ordering is key to debugging issues.

Conclusion

This guide covered the topology sort technique in detail, explaining the algorithms, implementations, complexity analysis, and applications using Python.

Key points:

You should now have a solid grasp of topological sorting and be able to leverage it to schedule and optimize real-world systems with dependency constraints using Python.

The techniques discussed can be applied in data science, DevOps, networking, and many other domains where dependency ordering is required. This establishes a key foundation for working with directed acyclic graphs and workflows.