Hello, I was asked this question in an interview of a trading firm.
Say you have an array of 'n' numbers and you have to find the kth largest element in the array. The traditional methods for doing this are;
- Sort the array and then return the kth largest from back - This will be (n-k-1)th element from the front. Time complexity O(nLogn)
- Take a min-Heap of size k - Time complexity O(nLogk)
- quick-select variant - Average Time complexity O(k), worst time complexity O(n^2)
The interviewer asked me that if we have to write a parallel version of this algorithm, then out of the above three methods which one will we chose?
I suggested that in a parallel version we will:
- divide the existing data into sequential chunks. Each chunk is accessed by one thread only
- sort each chunk using quicksort/merge sort
- From each sorted chunk, we will start iterating from the back and compare the elements till we find the kth largest element from the back. we can use something like a min-heap to do so.
Any suggestions on how to do this better in a parallel version.