Skip to content

Mastering Heap Sort in Python: A Comprehensive Guide

Updated: at 02:24 AM

Heap Sort is a comparison based sorting algorithm that makes use of a binary heap data structure to sort elements. It works by first organizing the unsorted list into a special heap data structure called a max-heap, and then repeatedly extracting the maximum element and putting it into the sorted list while maintaining the heap property on the remaining elements.

Heap Sort has an optimal worst-case time complexity of O(n log n), making it an efficient sorting algorithm. In this comprehensive guide, we will cover the following topics related to implementing Heap Sort in Python:

Table of Contents

Open Table of Contents

Overview of Heap Sort

Heap Sort works by first organizing the data into a special tree structure called a binary heap. In a max-heap, the largest element is always at the root, and the subtrees below it also satisfy the max-heap property.

Once the max-heap is built, the largest element can be repeatedly extracted from the root and placed into the sorted list. After each extraction, the heap structure is re-established for the remaining elements, allowing further extractions in decreasing order.

The steps involved in Heap Sort are:

  1. Build a max-heap from the unsorted input list.
  2. The largest item is stored at the root of the heap. Replace it with the last item of the heap followed by reducing the size of heap by 1.
  3. Repeat step 2 while heap size is greater than 1.

This gives us the elements in descending order in the sorted list.

Below is a visual representation of how Heap Sort works:

Heap Sort Algorithm Visualization

Image Source: Wikimedia Commons

Max-Heap Data Structure

Heap Sort relies on the heap data structure to organize elements. A max-heap is a complete binary tree where the value in each internal node is greater than or equal to the values in its child nodes.

Some important properties of a max-heap:

Python provides the heapq module that contains functions like heapify(), heappop(), etc. to build and manipulate heap data structures. However, we will implement max-heaps manually using lists.

Building a Max-Heap

To build a max-heap from an unordered list:

  1. We can start by inserting elements to an empty heap one by one.

  2. Or we can perform max-heapify starting from the lowest non-leaf nodes all the way to the root.

The second approach is more efficient with O(n) time complexity compared to O(n log n) for the first approach.

Max-Heapify

The max_heapify() function lets us create the heap property in a subtree. It assumes the binary trees rooted at the left and right child nodes are already max-heaps.

It runs in O(log n) time. We call it on each node starting from the last internal node down to the root to build the heap in O(n) time.

def max_heapify(A, i):
    left = 2 * i + 1
    right = 2 * i + 2
    largest = i

    if left < len(A) and A[left] > A[i]:
        largest = left

    if right < len(A) and A[right] > A[largest]:
        largest = right

    if largest != i:
        A[i], A[largest] = A[largest], A[i]
        max_heapify(A, largest)

This function recursively swaps nodes if they violate the heap property. In this way, we can convert any unordered list into a max-heap efficiently.

Heap Sort Algorithm

Now that we have understood heaps and how to build them, let’s look at the high-level steps of the Heap Sort algorithm:

Pseudocode

heapSort(A):
  BuildMaxHeap(A)
  for i = length(A) down to 2:
    swap A[1] with A[i]
    heapify(A, 1, i - 1)
  return A

Python Implementation

Here is how we can implement the Heap Sort algorithm in Python:

def heap_sort(arr):
  n = len(arr)

  # Build max heap
  for i in range(n//2, -1, -1):
    max_heapify(arr, i)

  for i in range(n-1, 0, -1):
    # Swap
    arr[i], arr[0] = arr[0], arr[i]

    # Heapify root element
    max_heapify(arr, 0, i)

  return arr

Below is the full code implementation with the max_heapify() helper method:

def max_heapify(arr, i, size):
  # Using size instead of len(arr) allows operations on slices of array
  left = 2 * i + 1
  right = 2 * i + 2

  largest = i
  if left < size and arr[left] > arr[i]:
    largest = left

  if right < size and arr[right] > arr[largest]:
    largest = right

  if largest != i:
    arr[i], arr[largest] = arr[largest], arr[i]
    max_heapify(arr, largest, size)

def heap_sort(arr):
  n = len(arr)

  for i in range(n//2, -1, -1):
    max_heapify(arr, i, n)

  for i in range(n-1, 0, -1):
    arr[i], arr[0] = arr[0], arr[i]
    max_heapify(arr, 0, i)

  return arr

Time and Space Complexity

Heap Sort has optimal O(nlogn) time complexity like Merge Sort but requires only constant auxiliary space.

Comparison with Other Sorting Algorithms

Here is how Heap Sort compares to other common sorting algorithms:

AlgorithmTime ComplexitySpace ComplexityStableIn-place
Heap SortO(n log n)O(1)NoYes
Quick SortO(n log n)O(log n)NoYes
Merge SortO(n log n)O(n)YesNo
Insertion SortO(n^2)O(1)YesYes
Selection SortO(n^2)O(1)NoYes
Bubble SortO(n^2)O(1)YesYes

So Heap Sort provides a good balance of speed, low space requirement, and ease of implementation.

Stability

An important property of sorting algorithms is stability. A stable sorting algorithm maintains the relative order of elements with equal keys in the sorted output.

Heap Sort does not maintain stability because of the swapping of elements during the heapify operation. So it is considered unstable.

For example, consider the input array:

['a', 'b', 'b', 'a', 'c']

A stable sort will always keep the relative order of ‘a’ and first ‘b’ same in the sorted output. An unstable algorithm like Heap Sort does not guarantee this.

Common Mistakes and Solutions

Here are some common mistakes to avoid when implementing Heap Sort in Python:

Careful debugging and adding print statements can help identify issues causes by such subtle errors.

Applications of Heap Sort

Some real-world applications and use cases of Heap Sort include:

When to Use Heap Sort

Heap Sort is most suitable in situations when:

The in-place O(1) space complexity and O(n log n) time make Heap Sort a good choice for sorting large datasets efficiently.

Real-World Examples

Here are some real-world examples that use Heap Sort algorithm:

By providing an optimal balance of speed, memory usage and ease of implementation, Heap Sort proves to be a valuable algorithm for many large-scale, real world sorting applications.

Conclusion

In this guide, we covered a comprehensive overview of Heap Sort in Python - its principles, algorithm, implementation, complexity analysis, comparisons to other sorts, common mistakes, applications, use cases and real-world examples.

Heap Sort leverages the heap data structure to provide optimal O(nlogn) sorting with minimal O(1) memory overhead. While not stable, it provides a great option when stability is not required and low space complexity is desired.

With its wide applicability for sorting large streams of data and implementing priority queues, Heap Sort is a valuable algorithm that every Python programmer should understand.