If k = 1 the linear-time solution is quite simple. One could keep the frequency of elements appearance in a hash map and update the maximum element at each step.

When k > 1 we need a data structure that has a fast access to the elements ordered by their frequencies. The idea here is to use the heap which is also known as priority queue.

Approach 1: Heap

The first step is to build a hash map element -> its frequency. In Java we could use data structure HashMap but have to fill it manually. Python provides us both a dictionary structure for the hash map and a method Counter in the collections library to build the hash map we need.
This step takes time where N is number of elements in the list.

The second step is to build a heap. The time complexity of adding an element in a heap is and we do it N times that means time complexity for this step.

The last step to build an output list has
time complexity.

In Python there is a method nlargest in heapq library (check here the source code) which has the same time complexity and combines two last steps in one line.


Complexity Analysis

  • Time complexity : . The complexity of Counter method is . To build a heap and output list takes . Hence the overall complexity of the algorithm is .

  • Space complexity : to store the hash map.

Side Notes

Following the complexity analysis, the approach is optimal for small k. In the case of large k, one could revert the procedure by excluding the less frequent elements from the output.

Analysis written by @liaison and @andvary