Heap Sort is a comparison based sorting algorithm that makes use of a binary heap data structure to sort elements. It works by first organizing the unsorted list into a special heap data structure called a max-heap, and then repeatedly extracting the maximum element and putting it into the sorted list while maintaining the heap property on the remaining elements.
Heap Sort has an optimal worst-case time complexity of O(n log n), making it an efficient sorting algorithm. In this comprehensive guide, we will cover the following topics related to implementing Heap Sort in Python:
Table of Contents
Open Table of Contents
Overview of Heap Sort
Heap Sort works by first organizing the data into a special tree structure called a binary heap. In a max-heap, the largest element is always at the root, and the subtrees below it also satisfy the max-heap property.
Once the max-heap is built, the largest element can be repeatedly extracted from the root and placed into the sorted list. After each extraction, the heap structure is re-established for the remaining elements, allowing further extractions in decreasing order.
The steps involved in Heap Sort are:
- Build a max-heap from the unsorted input list.
- The largest item is stored at the root of the heap. Replace it with the last item of the heap followed by reducing the size of heap by 1.
- Repeat step 2 while heap size is greater than 1.
This gives us the elements in descending order in the sorted list.
Below is a visual representation of how Heap Sort works:
Image Source: Wikimedia Commons
Max-Heap Data Structure
Heap Sort relies on the heap data structure to organize elements. A max-heap is a complete binary tree where the value in each internal node is greater than or equal to the values in its child nodes.
Some important properties of a max-heap:
- The tree is completely filled on all levels except the lowest, which is filled from left to right.
- The maximum element is always at the root.
- The max-heap property should be true for all subtrees as well.
Python provides the heapq
module that contains functions like heapify()
, heappop()
, etc. to build and manipulate heap data structures. However, we will implement max-heaps manually using lists.
Building a Max-Heap
To build a max-heap from an unordered list:
-
We can start by inserting elements to an empty heap one by one.
-
Or we can perform
max-heapify
starting from the lowest non-leaf nodes all the way to the root.
The second approach is more efficient with O(n) time complexity compared to O(n log n) for the first approach.
Max-Heapify
The max_heapify()
function lets us create the heap property in a subtree. It assumes the binary trees rooted at the left and right child nodes are already max-heaps.
It runs in O(log n) time. We call it on each node starting from the last internal node down to the root to build the heap in O(n) time.
def max_heapify(A, i):
left = 2 * i + 1
right = 2 * i + 2
largest = i
if left < len(A) and A[left] > A[i]:
largest = left
if right < len(A) and A[right] > A[largest]:
largest = right
if largest != i:
A[i], A[largest] = A[largest], A[i]
max_heapify(A, largest)
This function recursively swaps nodes if they violate the heap property. In this way, we can convert any unordered list into a max-heap efficiently.
Heap Sort Algorithm
Now that we have understood heaps and how to build them, let’s look at the high-level steps of the Heap Sort algorithm:
Pseudocode
heapSort(A):
BuildMaxHeap(A)
for i = length(A) down to 2:
swap A[1] with A[i]
heapify(A, 1, i - 1)
return A
BuildMaxHeap()
builds a max-heap from the input list A.- We repeatedly swap the root (maximum element) A[1] with the last element A[i] and heapify the reduced heap from 1 to i-1.
- This extracts the max element and maintains the heap property.
- The sorted list is built in decreasing order.
Python Implementation
Here is how we can implement the Heap Sort algorithm in Python:
def heap_sort(arr):
n = len(arr)
# Build max heap
for i in range(n//2, -1, -1):
max_heapify(arr, i)
for i in range(n-1, 0, -1):
# Swap
arr[i], arr[0] = arr[0], arr[i]
# Heapify root element
max_heapify(arr, 0, i)
return arr
-
We first build the max-heap by calling
max_heapify()
on all internal nodes starting from the last level downwards. -
Then we repeatedly swap the root with the last element and call
max_heapify()
on the reduced heap from 0 to i-1. -
This extracts elements in decreasing order while maintaining the heap structure.
-
Finally, we return the sorted array.
Below is the full code implementation with the max_heapify()
helper method:
def max_heapify(arr, i, size):
# Using size instead of len(arr) allows operations on slices of array
left = 2 * i + 1
right = 2 * i + 2
largest = i
if left < size and arr[left] > arr[i]:
largest = left
if right < size and arr[right] > arr[largest]:
largest = right
if largest != i:
arr[i], arr[largest] = arr[largest], arr[i]
max_heapify(arr, largest, size)
def heap_sort(arr):
n = len(arr)
for i in range(n//2, -1, -1):
max_heapify(arr, i, n)
for i in range(n-1, 0, -1):
arr[i], arr[0] = arr[0], arr[i]
max_heapify(arr, 0, i)
return arr
Time and Space Complexity
-
Time Complexity: O(nlogn)
- Building the max heap takes O(n) time.
- The heapify operation takes O(logn) time.
- As we perform n-1 heapify operations, the overall time complexity is n*(logn) = O(nlogn)
-
Space Complexity: O(1)
- Heap Sort is an in-place sorting algorithm, so the space complexity is constant O(1).
Heap Sort has optimal O(nlogn) time complexity like Merge Sort but requires only constant auxiliary space.
Comparison with Other Sorting Algorithms
Here is how Heap Sort compares to other common sorting algorithms:
Algorithm | Time Complexity | Space Complexity | Stable | In-place |
---|---|---|---|---|
Heap Sort | O(n log n) | O(1) | No | Yes |
Quick Sort | O(n log n) | O(log n) | No | Yes |
Merge Sort | O(n log n) | O(n) | Yes | No |
Insertion Sort | O(n^2) | O(1) | Yes | Yes |
Selection Sort | O(n^2) | O(1) | No | Yes |
Bubble Sort | O(n^2) | O(1) | Yes | Yes |
- Heap Sort is faster than quadratic sorting algorithms like Insertion, Selection and Bubble Sort.
- It has the same time complexity as optimized Quick Sort and Merge Sort.
- It requires only constant O(1) space unlike Merge Sort.
- But it is not stable like Insertion and Bubble Sort.
So Heap Sort provides a good balance of speed, low space requirement, and ease of implementation.
Stability
An important property of sorting algorithms is stability. A stable sorting algorithm maintains the relative order of elements with equal keys in the sorted output.
Heap Sort does not maintain stability because of the swapping of elements during the heapify operation. So it is considered unstable.
For example, consider the input array:
['a', 'b', 'b', 'a', 'c']
A stable sort will always keep the relative order of ‘a’ and first ‘b’ same in the sorted output. An unstable algorithm like Heap Sort does not guarantee this.
Common Mistakes and Solutions
Here are some common mistakes to avoid when implementing Heap Sort in Python:
-
Forgetting to pass the
size
parameter inmax_heapify()
. This leads to IndexError as the left and right child indices can go out of bounds of array slices. -
Neglecting to build the max-heap first using
max_heapify()
before extracting elements. The sorted output will be incorrect without this initial heap structure. -
Accessing incorrect indices when swapping elements or heapifying. Carefully double check array indexes and slices.
-
Not heapifying all the way down to first element after swapping. The max-heap property will be lost.
-
Using
min_heapify()
instead ofmax_heapify()
by mistake. This implements Min Heap Sort instead of Max Heap Sort. -
Returning the list instead of sorted copy for in-place implementation. The input list will be modified.
Careful debugging and adding print statements can help identify issues causes by such subtle errors.
Applications of Heap Sort
Some real-world applications and use cases of Heap Sort include:
-
Sorting large data streams: Heap Sort can be used to efficiently sort data streams that cannot fit entirely in memory. Elements can be read one by one, inserted in heap and sorted.
-
Priority queues: Heap data structure itself is commonly used to implement priority queues which require frequent insertion and extraction based on priority. Operations like scheduling tasks based on priority are enabled using heaps.
-
Order statistics: Finding kth largest/smallest element in an unsorted list can be found efficiently using a max/min heap. Useful for analytics applications.
-
Graph algorithms: Heaps are used in algorithms like Dijkstra’s Shortest Path and Prim’s Minimal Spanning Tree for extracting minimum weight edge efficiently.
-
External sorting: When sorting huge files that cannot fit in memory, Heap Sort is used along with merge sort to perform the operation efficiently.
When to Use Heap Sort
Heap Sort is most suitable in situations when:
- Stability of sorting algorithm is not required
- Limited extra space is available
- Data is accessed sequentially and sorted data needs to be generated one by one
- Priority queue features are needed in the application
- Finding extreme values (min or max) quickly is required
The in-place O(1) space complexity and O(n log n) time make Heap Sort a good choice for sorting large datasets efficiently.
Real-World Examples
Here are some real-world examples that use Heap Sort algorithm:
-
Task scheduling - Operating systems often schedule processes based on priority ordering implemented using a priority queue (heap data structure). Heap Sort helps perform this efficiently.
-
Order statistics - Analytics platforms need to find highest/lowest value data points (max/min order statistics). Maintaining max/min heap helps quickly find these.
-
Memory optimization - For apps like cloud storage or servers sorting large files, Heap Sort minimizes memory usage while still achieving optimal speed.
-
Search algorithms - Heap Sort combined with binary search can improve algorithms like best-first search used in artificial intelligence.
-
Graph algorithms - Algorithms that use priority queues like Dijkstra’s algorithm for shortest path rely on heaps as an underlying data structure.
-
External sorting - Database systems need to handle massive datasets and implement efficient external sorting using a hybrid of Heap Sort and Merge Sort.
By providing an optimal balance of speed, memory usage and ease of implementation, Heap Sort proves to be a valuable algorithm for many large-scale, real world sorting applications.
Conclusion
In this guide, we covered a comprehensive overview of Heap Sort in Python - its principles, algorithm, implementation, complexity analysis, comparisons to other sorts, common mistakes, applications, use cases and real-world examples.
Heap Sort leverages the heap data structure to provide optimal O(nlogn) sorting with minimal O(1) memory overhead. While not stable, it provides a great option when stability is not required and low space complexity is desired.
With its wide applicability for sorting large streams of data and implementing priority queues, Heap Sort is a valuable algorithm that every Python programmer should understand.