Merging overlapping intervals is a common coding interview question that assesses a candidate’s skills in algorithms and data structures. This programming task involves taking a collection of intervals, identifying any intervals that overlap, and merging them into a single continuous interval.
In Python, intervals can be represented as tuples or lists containing a start and end value. The key steps are to sort the intervals by start time, iterate through the sorted intervals, compare overlaps, and combine overlapping intervals. While conceptually simple, implementing an optimal solution that runs efficiently on large inputs requires knowledge of algorithms like sorting, binary search, and computational geometry.
This comprehensive guide will walk through techniques to merge intervals in Python. We will cover the problem statement, illustrate with examples, present multiple solutions using built-in functions and algorithms, analyze time and space complexities, include runnable example code snippets with explanations, and discuss test cases. Following this tutorial will provide Python programmers the knowledge to handle interval merging questions confidently in coding interviews and applications dealing with schedules, calendars, reservations, etc.
Table of Contents
Open Table of Contents
Overview of the Problem
The interval merging problem can be formally stated as:
Given a collection of intervals, merge all overlapping intervals to produce a set of mutually exclusive intervals.
For example:
Input: [[1,4], [2,5], [7,9]]
Output: [[1,5], [7,9]]
# Since [1,4] and [2,5] overlap, they are merged into [1,5]
The key aspects are:
- Intervals are represented as tuples or lists containing a start and end value.
- Intervals can be closed like [1,5] or half-open like [1,5). We will use closed intervals.
- The merged intervals should be mutually exclusive, meaning no overlaps.
- The relative order of the intervals does not matter, only overlaps.
This problem has applications in calendar scheduling, data visualization, and CPU task scheduling. Before diving into solutions, let’s look at example test cases to clarify the requirements.
Example Test Cases
Input: [[1,4], [2,6], [8,10], [15,18]]
Output: [[1,6], [8,10], [15,18]]
Input: [[1,4], [2,3]]
Output: [[1,4]]
Input: [[1,10], [2,6], [3,5], [7,9]]
Output: [[1,10]]
Input: [[2,3], [4,5], [6,7], [8,9], [1,10]]
Output: [[1,10]]
Key Observations
- The intervals may not be sorted initially.
- Duplicate intervals should be removed.
- Tiny overlaps like [1,2] and [2,3] should be merged into [1,3].
- Large overlaps like [2,6] and [1,10] should be merged into [1,10].
- The relative order of the intervals does not matter.
These test cases illustrate the core requirements - handle unsorted intervals, merge all types of overlaps, and return mutually exclusive intervals. This forms the basis for developing correct and efficient solutions, which we will explore next.
Solution 1: Using Built-in Functions
Python’s built-in sorted()
, min()
, and max()
functions can directly solve this problem concisely:
def merge_intervals(intervals):
sorted_intervals = sorted(intervals, key=lambda x: x[0]) # Sort by start time
merged = []
for interval in sorted_intervals:
if not merged or merged[-1][1] < interval[0]: # No overlap
merged.append(interval)
else: # Overlap, merge intervals
merged[-1][1] = max(merged[-1][1], interval[1])
return merged
How it Works:
-
Sort the intervals by start time using
sorted()
and a lambda key function. -
Iterate through the sorted intervals.
-
Compare each interval with the last interval in the
merged
list:- If no overlap, directly append the interval to
merged
. - If overlap, update the end time of last interval in
merged
to the max of the end times.
- If no overlap, directly append the interval to
-
Return the final merged intervals list.
This leverages Python’s simple yet powerful built-in functions to solve interval merging in a clean, readable manner without explicitly writing sorting or searching logic.
Time Complexity: O(n log n)
Sorting takes O(n log n) time. Overall complexity is dominated by the sorting step.
Space Complexity: O(n) The merged list contains at most n intervals, so space used is linear O(n).
Solution 2: Sort and Scan
We can also implement the sort-and-scan solution manually without built-in functions:
def merge_intervals(intervals):
intervals.sort(key=lambda x: x[0]) # Custom sort on start time
merged = []
for interval in intervals:
if not merged or merged[-1][1] < interval[0]:
merged.append(interval)
else:
merged[-1][1] = max(merged[-1][1], interval[1])
return merged
This follows the same overall approach - sort on start time, scan linearly to find/merge overlaps. The key differences are:
- Implementing a custom
sort()
with a key function instead ofsorted()
. - Manually comparing the previous end time instead of
merged[-1][1]
.
The time and space complexity remains unchanged at O(n log n) and O(n) respectively. This illustrates writing a sorting algorithm manually without built-in functions.
Solution 3: Sweep Line Algorithm
The sweep line algorithm is a common technique used in computational geometry problems like interval scheduling. It utilizes the following steps:
- Sort intervals by start time.
- Initialize sweep line position to the start of first interval.
- Iterate through intervals:
- If current interval start > sweep line position, no overlap. Move sweep line position to start of current interval.
- If overlap, merge intervals by updating end to max end of overlapping intervals.
- Return merged intervals.
def merge_intervals(intervals):
intervals.sort(key=lambda x : x[0])
merged = []
sweep_line = intervals[0][0]
for interval in intervals:
start = interval[0]
end = interval[1]
if start > sweep_line: # No overlap
merged.append(interval)
sweep_line = start
else: # Overlap, merge intervals
merged[-1][1] = max(merged[-1][1], end)
return merged
This demonstrates the sweep line algorithm which avoids explicitly tracking overlap logic. The runtime remains O(n log n) due to sort, but the scanning steps are simplified.
Solution 4: Using a Binary Search Tree
A self-balancing Binary Search Tree (BST) can also efficiently solve the interval merging problem in O(n log n) time.
The steps are:
- Insert each interval sorted by start time into BST.
- Traverse BST in order and if current node overlaps with previous, merge them.
- Return merged intervals from BST.
from bst import BSTNode, BST
# BST Node class
class IntervalNode(BSTNode):
def __init__(self, interval, left=None, right=None):
self.interval = interval
super().__init__(interval[0], left, right)
# BST class
class IntervalBST(BST):
def insert(self, interval):
self.root = self._insert(self.root, IntervalNode(interval))
def traverse(self):
intervals = []
self._inorder(self.root, intervals)
return intervals
def _inorder(self, root, intervals):
if root:
self._inorder(root.left, intervals)
intervals.append(root.interval)
self._inorder(root.right, intervals)
def merge_overlaps(self):
return self._merge(self.traverse())
def _merge(self, intervals):
merged = []
for interval in intervals:
# Merge overlap logic
return merged
def merge_intervals(intervals):
bt = IntervalBST()
for interval in intervals:
bt.insert(interval)
return bt.merge_overlaps()
This leverages the BST structure to efficiently sort, search, and merge intervals in one traversal. The self-balancing property also guarantees O(log n) operations. For coding interviews, this shows expertise in complex data structures like BSTs.
Solution 5: Using a Segment Tree
A segment tree is a specialized tree data structure used for interval related queries like merging, searching, etc. It follows these steps:
- Build the segment tree from the input intervals.
- Use tree intervals to efficiently merge overlapping intervals.
- Return merged intervals from tree.
class SegmentNode:
def __init__(self, start, end):
self.start = start
self.end = end
self.left = None
self.right = None
# Build tree
def build(intervals):
# Recursively build tree
def build_tree(start, end):
if start > end:
return None
mid = start + (end - start) // 2
left = build_tree(start, mid)
right = build_tree(mid + 1, end)
return SegmentNode(start, end, left, right)
return build_tree(0, len(intervals))
# Merge overlaps
def merge(root):
# Recursive merge
def merge_recur(root):
if not root:
return
merge_recur(root.left)
merge_recur(root.right)
# Merge logic if root overlaps with previous
return root
def merge_intervals(intervals):
tree = build(intervals)
merged = merge(tree)
return merged
Segment trees provide O(log n) query times. This is an optimal and efficient data structure for interval related problems. In interviews, this displays strong data structure knowledge.
Solution Analysis
Solution | Time Complexity | Space Complexity |
---|---|---|
Built-in Functions | O(n log n) | O(n) |
Sort and Scan | O(n log n) | O(n) |
Sweep Line | O(n log n) | O(n) |
Binary Search Tree | O(n log n) | O(n) |
Segment Tree | O(n log n) | O(n) |
- Sorting algorithms or self-balancing BST/SegTrees give O(n log n) time complexity.
- Storing n intervals requires O(n) additional space.
- Sweep line provides a clean scanning approach avoiding explicit overlap checks.
- BST and Segment tree solutions display mastery over complex data structures.
- Built-in functions provide the most concise readable solution.
So in summary, for most cases, the built-in or sort-and-scan solutions provide the best balance of simplicity and efficiency. BST or Segment tree solutions are only needed for advanced use cases dealing with many real-time interval queries.
Tips and Tricks
Here are some additional tips for solving interval merge problems efficiently:
- Carefully handle edge cases like empty lists or null/None values.
- Avoid multiple sorts - sort once upfront if possible.
- Prefer to sort on the start interval since merge depends on start time.
- Use lambda key functions instead of wrapper classes for custom sorts.
- Store intervals in simple data structures like list or tuples instead of objects.
- Write helper functions for repeated tasks like overlap check, merge logic, etc.
- Interval trees provide faster merging but have slower construction.
- Space can be optimized by merging in-place instead of storing merged list.
Summary
This guide covered a variety of techniques to merge overlapping intervals in Python:
- Leverage built-in functions like sorted(), min(), max() for a simple readable solution.
- Implement a sort-and-scan algorithm manually without built-ins.
- Use sweep line algorithm to simplify overlap checks during scan.
- Employ BSTs or segment trees for efficient O(log n) interval querying.
- Analyzed time and space complexity tradeoffs of different solutions.
The key ideas are to sort upfront by start interval, scan linearly to find/merge overlaps, and return a final set of mutually exclusive merged intervals.
Interval manipulation problems frequently appear in coding interviews and calendar/scheduling applications. I hope this detailed walkthrough provides a deeper understanding of intervals in Python along with the confidence to tackle related algorithm questions. Please leave any questions in the comments!