Detecting Cycles in a Linked List in Python

Linked lists are common data structures used in many algorithms and applications. They consist of nodes that contain data and a reference (or pointer) to the next node in the list. Unlike arrays, linked lists do not store data contiguously in memory, allowing for efficient insertion and removal of nodes.

A key question that arises with linked lists is whether they contain cycles, where a node’s next reference points back to an earlier node, creating a loop in the list. Detecting cycles is important for several reasons:

It can indicate a bug in code that constructs linked lists. Cycles can lead to infinite loops and memory leaks.
Certain algorithms require acyclic linked lists to function properly. Identifying cycles allows you to correctly handle them.
Knowing if a cycle exists allows you to determine the start of the cycle. This node’s reference mistakenly points backward instead of forward.

In this comprehensive Python programming guide, you will learn approaches to detect cycles in a linked list. The step-by-step instructions provided include example code and explanations of the key concepts. By the end, you will be able to implement efficient solutions to this common technical interview question using Python.

Open Table of Contents

Prerequisites
Approach 1: Hash Table
- Hash Table Approach Pros and Cons
Approach 2: Two Pointers
- Two Pointer Approach Pros and Cons
Approach 3: Set Node Values
- Set Values Approach Pros and Cons
Approach 4: Recursive Depth-First Search
- Recursive DFS Pros and Cons
Comparing Approaches
Applications
Conclusion

Prerequisites

To fully understand this guide, you should have:

Knowledge of Python (variables, conditional logic, loops, functions, etc.)
Understanding of object-oriented programming concepts like classes and objects
Basic knowledge of linked list data structures and node connections

The examples will use a simple Node class to represent each element in the linked list:

class Node:
  def __init__(self, data):
    self.data = data
    self.next = None

The data attribute stores the node’s data, while next points to the following node.

Approach 1: Hash Table

The hash table approach keeps track of nodes seen while traversing the linked list. If a node is seen more than once, a cycle must exist. The steps are:

Initialize a hash table or set to store seen nodes.
Traverse the linked list from the head node.
For each node, check if it is in the hash table.
If not, add it and move to the next node.
If it already exists in the hash table, a cycle is present.

Here is Python code implementing this technique:

# Initialize hash table
seen = set()

current = head
while current:
  # Check if node exists
  if current in seen:
    return True # Cycle detected

  # Add to hash table
  seen.add(current)

  # Move to next node
  current = current.next

return False # No cycle detected

The time complexity is O(N) since each node is visited at most once. The space complexity is O(N) for the hash table storing N nodes.

Hash Table Approach Pros and Cons

Pros:

Simple and intuitive solution
Constant time insertions and lookups in hash table
Linear time and space complexities

Cons:

Requires extra storage for the hash table proportional to nodes in the list

Approach 2: Two Pointers

The two pointer technique checks for a cycle by using 2 pointers traversing the list at different speeds. If the linked list has a cycle, the pointers will eventually meet inside the cycle.

The steps for this approach are:

Initialize two pointers - slow and fast
Increment slow by 1 node, fast by 2 nodes
If fast encounters a null node, no cycle exists
If slow and fast meet, a cycle is present
To find the start node, increment a pointer from head and another from meeting node until they collide

Here is the Python implementation:

def has_cycle(head):

  # Initialize slow and fast pointers
  slow, fast = head, head

  while fast and fast.next:

    # Increment slow by 1, fast by 2
    slow = slow.next
    fast = fast.next.next

    # Check if pointers meet
    if slow == fast:
      return True

  # fast reached null, so no cycle
  return False

# If cycle found, find start node by incrementing
# pointer p1 from head, and p2 from meeting point
def start_node(head):

  # Check for cycle
  if not has_cycle(head):
    return None

  # Initialize p1 from head, p2 from meeting point
  p1 = head
  p2 = head

  # Increment p1 by 1, p2 by 1 until they collide
  while p1 != p2:
    p1 = p1.next
    p2 = p2.next

  # Return start node
  return p1

The time complexity is O(N) to detect a cycle and O(N) to find the start node. Only two pointers are used, so the space complexity is O(1).

Two Pointer Approach Pros and Cons

Pros:

Faster than hash table approach - no extra data structure
Constant O(1) space usage
Can find start of cycle node

Cons:

More complex logic than hash table approach
Requires meeting point to find start node, adding extra steps

Approach 3: Set Node Values

This technique modifies node values as the list is traversed. The steps are:

Initialize variable value to 0
Traverse list and set node values to value
Increment value after setting each node
If a node with value already set is found, a loop exists

Here is Python code for this algorithm:

value = 0

current = head
while current:

  # Cycle exists if node value
  # already changed
  if current.data == value:
    return True

  # Set node value
  current.data = value

  # Increment for next node
  value += 1

  current = current.next

return False # No cycle

By changing node values, we can detect if a node has been visited before. The time complexity is O(N) to visit each node once. The space used is O(1).

Set Values Approach Pros and Cons

Pros:

In-place algorithm, no extra memory needed
Simple concept
Linear time complexity

Cons:

Modifies the existing linked list
May not work if data stored cannot be modified
Slower than two pointer approach

Approach 4: Recursive Depth-First Search

This recursive algorithm traverses the graph depth-first, keeping track of visited nodes. If a node is visited more than once, there is a cycle.

Here are the steps:

Create a visited set to track visited nodes
Define recursive DFS function accepting current node
Mark node visited and recursively call DFS on adjacent unvisited nodes
If node was already visited, return True for cycle
Return False if all nodes are visited once

And here is a Python implementation:

visited = set()

def dfs(node):

  # Visited before, cycle exists
  if node in visited:
    return True

  # Add to visited
  visited.add(node)

  # Recursively visit adjacent nodes
  for n in node.nexts:
    if dfs(n):
      return True

  # No cycle detected
  return False

# Start at head node
return dfs(head)

The time complexity is O(N) since each node is visited once. The space used for the visited set is O(N).

Recursive DFS Pros and Cons

Pros:

Intuitive recursion-based cycle detection
Able to detect cycle in any graph, not just linked lists

Cons:

Extra recursive function call overhead
Recursive stack space usage
Slower than iterative approaches

Comparing Approaches

Approach	Time Complexity	Space Complexity	Finds Cycle Start	In-place
Hash Table	O(N)	O(N)	No	No
Two Pointers	O(N)	O(1)	Yes	Yes
Set Values	O(N)	O(1)	No	Yes
Recursive DFS	O(N)	O(N)	No	Yes

Hash table is simple but requires extra storage.
Two pointers is fast and finds cycle start with minimal overhead.
Set values directly modifies the list without other storage.
Recursive DFS is flexible for any graph but has recursion costs.

So in summary, the two pointer approach provides the best combination of low overhead, good performance, and ability to find the cycle start node.

Applications

Detecting linked list cycles has many useful applications:

Debugging - Finding cycles can reveal bugs in linked list code and prevent infinite loops or crashes.
System Design - Certain data structures like thread pools, file systems, and kernels use linked lists and require cycle detection.
Web Crawling - Crawler programs that traverse website link structures can leverage cycle detection to avoid getting stuck in loops.
Garbage Collection - Cycles prevent memory from being reclaimed. Cycle detection is useful for garbage collectors.
Network Routing - Routing protocols use complex graph structures susceptible to cycles, so identifying them improves performance and reliability.

Conclusion

This guide covered 4 techniques to determine if a cycle exists in a linked list using Python:

Hash table - Simple approach tracking nodes in a hash table during traversal.
Two pointers - Fast algorithm with constant space by iterating 2 pointers at different speeds.
Set values - Modify node values during traversal to detect visited nodes.
Recursive DFS - Leverage depth-first search and visited set to identify cycles.

The two pointer approach provides an optimal combination of speed, low overhead, and ability to find the node where the cycle originates. Linked list cycle detection is an important technique for writing robust code and improving applications like debugging, garbage collection, web crawling, and more.

By mastering these algorithms in Python and understanding their trade-offs, you will be prepared to efficiently handle this common technical interview question and tackle linked list cycle scenarios in your own programs.