Top K problems - Sort, Heap, and QuickSelect
12850

Overview

Top K problem asks to find the Kth largest/smallest element or the top K largest/smallest items from an unsorted array. It's one of the questions that you may encounter during an interview. Fortunately, the solutions to it is quite standard, which are mainly: 1) sort; 2) using a heap; 3) Quick Select.

Sorting first and then returning the Kth item definitely work, yet it yields O(N^2) or O(NlogN) time complexity, depending on which sorting algorithm you will choose. However, with the help of heap or QuickSelect, better time complexties could be achieved. This post will try to help you nail this kind of questions by introducing all of the solutions one by one.

image

Let’s take Leetcode 215. Kth Largest Element in an Array as an example.

Find the kth largest element in an unsorted array. Note that it is the kth largest element in the sorted order, not the kth distinct element.
Example 1:
Input: [3,2,3,1,2,4,5,5,6] and k = 4 Output: 4
Note: 
You may assume k is always valid, 1 ≤ k ≤ array's length.

Approaches

Note: The following approaches may need pre-requisite knowledge on heap and Quick Sort. Feel free to roughly browse relative materials if you're not familiar with them :p

I found a great intro to Quick Sort and the partition by @RevathyGovind : How quick sort works and the problems that can be solved using partition logic.

Sort - O(NlogN)

Sorting algorithms could be applied first before returning the Kth item. Time complexities for the following code would be O(NlogN).

class Solution:
    def findKthLargest(self, nums: List[int], k: int) -> int:
        return sorted(nums, reverse=True)[k-1]

Heap - O(NlogK)

We can implement a size-K Min-heap, and heap[0] would be the answer. More specifically:

  • We start with an empty heap, and push items into it before the size equals to K.
  • Then for the remaining items:
    • we ignore those which are smaller than heap[0], because they won’t contribute to the top K largest elements;
    • when the new item is larger than heap[0], we would switch it with heap[0] first, and call heapify(0) so that the heap structure could be maintained.

The height for a heap of size K is O(logK), therefore the total time complexity would be O(NlogK) since we need to push O(N) times.

Demo:

class Solution:
    def findKthLargest(self, nums: List[int], k: int) -> int:
        
        def heapify(i):
            """Given that the children are heaps, this function would
            convert a tree starting with this node to a heap

            Args:
                i (int): the index for the node to be heapified
            
            Returns: void
            """
            nonlocal heap
            if i >= len(heap):
                return

            child1 = 2 * i + 1
            child2 = 2 * i + 2
            max_i = i
            if child1 < len(heap) and heap[child1] < heap[max_i]:
                max_i = child1
            if child2 < len(heap) and heap[child2] < heap[max_i]:
                max_i = child2
            if max_i != i:
                heap[i],heap[max_i] = heap[max_i],heap[i]
                heapify(max_i)

        def push(num):
            """This function pushed new node to a heap.
            
            Args:
                num (int): The new element to be pushed.
            """
            nonlocal heap
            # append it to the heap first
            heap += [num]
            cur = len(heap) - 1
            par = (cur-1)//2
            # push it to the right place by switching it
            # with its parent
            while par >= 0 and heap[par] > heap[cur]:
                heap[cur],heap[par] = heap[par],heap[cur]
                cur = par
                par = (cur-1)//2

        heap = []

        for i in range(len(nums)):
            if len(heap) < k : # before size == K
                push(nums[i]) 
            elif nums[i] > heap[0]: # after size == K
                heap[0] = nums[i]
                heapify(0)
        return heap[0]
        

Quick Select - O(N)

Quickselect takes advantage of the partition function from Quick Sort. An explanation for partition in Leetcode:

One chooses a pivot and defines its position in a sorted array in a linear time using so-called partition algorithm.

However, after we place the pivot in the right place by calling partition(left, right), we won't recursively call both partition(left, pivot - 1) and partition(pivot + 1, right), because we don't need to! Instead, we discuss the relation between K and pivot:

  • If K < pivot, then the Kth largest item must be in (left, pivot - 1).
  • Else if K > pivot, then the Kth largest item must be in (pivot+1,right).
  • Else, we know K == pivot, we get the answer!

If we recursively run both halves like in Quick Sort, it would take O(NlogN). However, this approach keeps throwing away one half (in average case) of the array, which leads to O(N) time complexity in the average case, though it might reach O(N^2) in the worst case due to the instability of quicksort. (Wikipedia)

class Solution:
   def findKthLargest(self, nums: List[int], k: int) -> int:
       
       left, right = 0, len(nums)-1
       def partition(l,r):
           # the partition function from Quick Sort
           # https://en.wikipedia.org/wiki/Quicksort#Lomuto_partition_scheme
           pivot = l
           while l < r:
               while l <= r and nums[l] >= nums[pivot]:
                   l += 1
               while l <= r and nums[r] <= nums[pivot]:
                   r -= 1
               if l < r:
                   nums[l],nums[r] = nums[r],nums[l]
           nums[r],nums[pivot] = nums[pivot],nums[r]
           return r
       while True:
           # we only focus on one half
           # where the Kth element is
           p = partition(left,right)
           if p == k-1:
               return nums[p]
           if p > k-1:
               right = p-1
           else:
               left = p+1
       
       

Here are some practices for the Top K problem. Feel free to give it a shot!


Comments (8)