Find the k-th Smallest Element in the Union of Two Sorted Arrays

January 27, 2011 in binary search

Given two sorted arrays A, B of size m and n respectively. Find the k-th smallest element in the union of A and B. You can assume that there are no duplicate elements.


Thanks to an anonymous reader who posted this question.

I would have to admit that this problem is pretty tricky to solve. Like most difficult problems, it requires some pretty clever observations to solve in a neat way.

The trivial way, O(m+n):
Merge both arrays and the k-th smallest element could be accessed directly. Merging would require extra space of O(m+n). The linear run time is pretty good, but could we improve it even further?

A better way, O(k):
There is an improvement from the above method, thanks to readers who suggested this. (See comments below by Martin for an implementation). Using two pointers, you can traverse both arrays without actually merging them, thus without the extra space. Both pointers are initialized to point to head of A and B respectively, and the pointer that has the larger smaller (thanks to a reader for this correction) of the two is incremented one step. The k-th smallest is obtained by traversing a total of k steps. This algorithm is very similar to finding intersection of two sorted arrays.

The best solution, but non-trivial, O(lg m + lg n):
Although the above solution is an improvement both in run time and space complexity, it only works well for small values of k, and thus is still in linear run time. Could we improve the run time further?

The above logarithmic complexity gives us one important hint. Binary search is a great example of achieving logarithmic complexity by halving its search space in each iteration. Therefore, to achieve the complexity of O(lg m + lg n), we must halved the search space of A and B in each iteration.

We try to approach this tricky problem by comparing middle elements of A and B, which we identify as Ai and Bj. If Ai is between Bj and Bj-1, we have just found the i+j+1 smallest element. Why? Therefore, if we choose i and j such that i+j = k-1, we are able to find the k-th smallest element. This is an important invariant that we must maintain for the correctness of this algorithm.

Summarizing the above,

Maintaining the invariant
i + j = k – 1,

If Bj-1 < Ai < Bj, then Ai must be the k-th smallest,
or else if Ai-1 < Bj < Ai, then Bj must be the k-th smallest.

If one of the above conditions are satisfied, we are done. If not, we will use i and j as the pivot index to subdivide the arrays. But how? Which portion should we discard? How about Ai and Bj itself?

We make an observation that when Ai < Bj, then it must be true that Ai < Bj-1. On the other hand, if Bj < Ai, then Bj < Ai-1. Why?

Using the above relationship, it becomes clear that when Ai < Bj, Ai and its lower portion could never be the k-th smallest element. So do Bj and its upper portion. Therefore, we could conveniently discard Ai with its lower portion and Bj with its upper portion.

If you are still not convince why the above argument is true, try drawing blocks representing elements in A and B. Try visualize inserting blocks of A up to Ai in front of Bj-1. You could easily see that no elements in the inserted blocks would ever be the k-th smallest. For the latter, you might want to keep the invariant i + j = k – 1 in mind to reason why Bj and its upper portion could never be the k-th smallest.

On the other hand, the case for Ai > Bj is just the other way around. Easy.

Below is the code and I have inserted lots of assertion (highly recommended programming style by the way) to help you understand the code. Note that the below code is an example of tail recursion, so you could technically convert it to an iterative method in a straightforward manner. However, I would leave it as it is, since this is how I derive the solution and it seemed more natural to be expressed in a recursive manner.

Another side note is regarding the choices of i and j. The below code would subdivide both arrays using its array sizes as weights. The reason is it might be able to guess the k-th element quicker (as long as the A and B is not differed in an extreme way; ie, all elements in A are smaller than B). If you are wondering, yes, you could choose i to be A’s middle. In theory, you could choose any values for i and j as long as the invariant i+j = k-1 is satisfied.

VN:F [1.9.22_1171]
Rating: 4.7/5 (120 votes cast)
Find the k-th Smallest Element in the Union of Two Sorted Arrays, 4.7 out of 5 based on 120 ratings

105 responses to Find the k-th Smallest Element in the Union of Two Sorted Arrays

  1. A much simpler algorithm that is O(k) is this:

    int findKthSMallest(int[] A, int[] B, int k) {
    int a_offset = 0, b_offset = 0;
    if (A.length + B.length < k) return -1;

    while (true) {
    if (a_offset < A.length) {
    while (b_offset == B.length ||
    A[a_offset] <= B[b_offset]) {
    a_offset++;
    if (a_offset + b_offset == k) return A[a_offset];
    }
    }
    if (b_offset < B.length) {
    while (a_offset == A.length ||
    A[a_offset] >= B[b_offset]) {
    b_offset++;
    }
    if (a_offset + b_offset == k) return B[b_offset];
    }
    }
    }

    (btw, would be nice to have <pre/> support in comments).

    VA:F [1.9.22_1171]
    +1
    • Won’t work for this case
      {1,2,3} {4,5}, 3

      VA:F [1.9.22_1171]
      +1
    • your solution is very inspiring but there are minor bugs in your code:

      1, when return the kth smallest element, we need to return A[a_offset-1] and B[b_offset-1] instead of A[a_offset] or B[b_offset]. Since when a_offset is 3 then A[3] is actually the 4th element in array A.

      2, In the two sub loops, after a_offset++/b_offset++, if a_offset /b_offset are equal to A.length/B.length, we need to break this while loop. Otherwise, when this while condition is checked again, A[a_offset]/B[b_offser] will be out of range.

      I fixed these two minor bugs and post my code (in c#) as follows:

      VA:F [1.9.22_1171]
      +1
  2. Hi 1337coder,
    You are a genius.
    1.
    Two reverse sorted arrays A and B have been given.
    such that size of A is m and size of B is n
    You need to find the k th largest sum (a+b) where a is taken from A and b is taken from B. such that k < m*n
    It can be done in O(n*n) but i believe it can be done in a very efficient way. I heard someone say there is a paper from SODA. It's possible to solve it in O(N), however, it requires very complex algorithm. I am wondering if you can come up with an NlogN algorithm first then try O(N). These two problems have been very hot interview problems from Google, however, not many people can solve it very efficiently yet. I believe you can do it after I read many posts on your site.

    2.
    Given a N*N Matrix.
    All rows are sorted, and all columns are sorted.
    Find the Kth Largest element of the matrix.

    VA:F [1.9.22_1171]
    +1
    • #include
      using std::cin;;
      using std::cout;

      int main()
      {

      int A[]={10,8,5,3,2};

      int B[]={9,7,6,4,1};

      bool atraverse = true;

      int sumToBeatAIndex=0;

      int sumToBeatBIndex=1;

      int j=1;

      int k=0;

      int i;

      int sumtobeat;

      int n=5;

      int tempBIndex;

      int tempAIndex;

      cout<<"("<<B[0]<<","<<A[0]<<") ";

      sumtobeat=A[0]+B[1];

      for(i=1;isumtobeat)

      cout<<"("<<B[k]<<","<<A[j]<<") ";

      else

      {

      sumtobeat=B[k]+A[j];

      tempBIndex=k;

      tempAIndex=j;

      j=sumToBeatAIndex;

      k=sumToBeatBIndex;

      sumToBeatAIndex=tempAIndex;

      sumToBeatBIndex=tempBIndex;

      cout<<"("<<B[k]<<","<<A[j]<<") ";

      atraverse=!(atraverse);

      }

      if(atraverse)

      j++;

      else

      k++;

      }

      cout<<"\n";

      return 0;
      }

      VA:F [1.9.22_1171]
      +1
  3. I just want to point out that the trivial way does not require that much of space. You don't need to merge two arrays. Just traverse through them is enough.

    VA:F [1.9.22_1171]
    -1
  4. @simon:
    1) Using a heap-based solution you could get O(k lg min(m, n)) run time complexity. To figure out the O(n) solution seemed pretty difficult. It seemed like someone already solved it here: http://www.ocf.berkeley.edu/~wwu/cgi-bin/yabb/YaBB.cgi?board=riddles_cs;action=display;num=1132204952;start=25

    I haven't read his post, so I couldn't provide the validity of his solution yet.

    2) I remembered seeing this as an exercise problem in CLRS. Will look into it when I have time.

    VA:F [1.9.22_1171]
    0
  5. @Martin and Raynor:
    Yup both of you are right. I have updated my post, thanks!

    VA:F [1.9.22_1171]
    0
  6. @1337 — u are seriously a bilody genius or someone who has spent hours at algos. at work or at school and can tie all the algo concepts really well (Which is really encourage-able)

    Now-a-days at most software jobs where you just code (mostly read from a trivial DB and do some business logic and display results) you lose this touch.

    Thanks for all the great posts.

    VA:F [1.9.22_1171]
    0
  7. @Anonymous:
    Definitely the latter. I love algos and though I admit it didn't get apply at work very often, it does come into play once in awhile.

    Have you read the book "Programming pearls"? It gives some really good real world examples of how efficient algos can help even in some simple business logic.

    VA:F [1.9.22_1171]
    0
  8. The following algorithm can achieve O(logK) complexity. The idea is to first select k items from both arrays using their proportional length, then adjust the number of items from each array using binary search. The follwoing C# code illustrates the idea:

    private static int FindKthInUnion(int[] a, int[] b, int k)
    {
    if (k == 0) return Min(a[0], b[0]);
    int la = Min(a.Length – 1, k);
    int lb = Min(b.Length – 1, k);

    int i = (k * a.Length) / (a.Length + b.Length);
    int j = k – i – 1;

    for (; ; )
    {
    if (a[i] < b[j])
    {
    if (i == la || a[i + 1] > b[j])
    {
    return b[j];
    }
    else
    {
    i += (la – i + 1) / 2;
    j = k – i – 1;
    }
    }
    else
    {
    if (j == lb || b[j + 1] > a[i])
    {
    return a[i];
    }
    else
    {
    j += (lb – j + 1) / 2;
    i = k – i – 1;
    }
    }
    }
    }

    VA:F [1.9.22_1171]
    0
  9. Hi 1337coder,

    Can you please tel me why "We make an observation that when Ai < Bj, then it must be true that Ai < Bj-1 (you mentioned Bj-1 <Ai < Bj, right?). On the other hand, if Bj < Ai, then Bj < Ai-1." ?

    Thank you very much

    VA:F [1.9.22_1171]
    +2
  10. @Anonymous:
    If Bj-1 < Ai < Bj, then we are already done, no need to continue further.

    "We make an observation that when Ai < Bj, then it must be true that Ai < Bj-1" –> This observation is due to the above condition Bj-1 < Ai < Bj NOT being satisfied.

    VA:F [1.9.22_1171]
    +1
  11. I found this code sample to be easier to understand:
    http://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15451-s01/recitations/rec03/rec03.ps

    VA:F [1.9.22_1171]
    -1
    • I can see some limitation in this code. It won’t work if first array is of size 100, second is of size 10, and k = 50. It probably works only when k <= min (first array size, second array size)

      VA:F [1.9.22_1171]
      0
      • I think we can augment the code to skip the steps of comparison, if we knew that the index we have calculated is out of bounds in the smaller array

        VN:F [1.9.22_1171]
        0
  12. Hi 1337code, "Using a heap-based solution you could get O(k lg min(m, n)) run time complexity. " for simon's question 1. I feel we need a O(k lg max(m, n)), since after pop a max in the heap, we need to fill two neighbour items to the heap, we can get a max(m,n) heap for some m*n matrix like this one

    3*n matrix

    3*(n+n 2*n n
    3*(2n-1) 2*(n-1) n-1
    .
    .
    .
    3*(n) 2 1

    so the largest will be in the first column, after we pop the max in the heap , we need to push the next one in the first column and the one in the same row but in second column. So we get to the end of the first column we have a O(n) space heap

    VA:F [1.9.22_1171]
    0
  13. Regarding simon's problems. I went over the discussion at the Berkly forum and didn't find any solution that I was convinced both works and is O(n). I posted this to Stack Overflow, hoping the discussion there will turn out to be more fruitful:

    http://stackoverflow.com/questions/5000512/find-the-top-k-values-in-a-sorted-n-x-n-matrix

    VA:F [1.9.22_1171]
    0
  14. int findKthSmallestInoneset(int s1[], int s2[], int m, int n, int k){
    int mid, low, high, pos;

    if(k > (m + n))
    cout << "There does not exist the k-th element" < s2[n - 1])
    return s1[m - 1];
    else
    return s2[n - 1];
    }

    if(k == 1){
    if(s1[0] > s2[0])
    return s2[0];
    else
    return s1[0];
    }

    if(k > m)
    high = m – 1;
    else
    high = k – 1;
    low = 0;

    while(low (n – 1)){
    low = mid + 1;
    continue;
    }

    if(pos == (n – 1)){
    if(s1[mid] >= s2[pos])
    return s1[mid];
    low = mid + 1;
    }
    else{
    if(s1[mid] >= s2[pos] && s1[mid] = s2[pos + 1])
    high = mid – 1;

    if (s1[mid] < s2[pos])
    low = mid + 1;
    }
    }//while
    return 0;
    }

    VA:F [1.9.22_1171]
    0
  15. There is an O(lg(min{m, n, k}) method. :-) FYI.
    http://tanliboy.wordpress.com/2011/05/01/some-interesting-google-interview-problems/

    VA:F [1.9.22_1171]
    0
  16. I was looking for a solution to a very similar problem, the k-th largest element of 2 vectors sorted in nondecreasing order. I tried to adapt the solution to use with vectors, but it just did’t work:

    I just call the “select” function with the 2 vectors and fit them in 2 arrays. I think the algorithm is incorrect.

    I did the same with the solution above, from talinboy:

    It works perfectly and with O(lg(min{m, n, k}) !!

    I don’t know where is the error in your algorithm. It might be i’m also using vectors with duplicate elements.

    I think the algorithm from tanliboy is the quickest and most general implementation i’ve seen in internet that works fine.

    P.S.: two examples that doesn’t work with 1337 code:

    VA:F [1.9.22_1171]
    0
    • David, please read my post carefully:
      “You can assume that there are no duplicate elements.”
      My solution assumes no duplicate elements. If you really understand how my solution works, you should be able to adapt it to work with duplicate elements. tanliboy’s solution is another possible efficient solution, which uses the idea of binary search.

      VN:F [1.9.22_1171]
      0
  17. #include

    int findKthsmallest(int a[],int m,int b[],int n,int k)
    {
    int i=0,j=0,ti=0,tj=0,I=0,J=0,M=m,N=n;
    while(1)
    {
    ti = (int)((double)m/(m+n) * (k-1));
    tj = (k-1)-ti;
    i = I+ti;
    j= J+tj;
    //printf(” i=%d j=%d\n”,i,j);
    if(j>0 && j<N && ib[j-1] && a[i]0 && i<M && ja[i-1] && b[j]<a[i])
    return b[j];
    if(j==0 && i<M && a[i]<b[j])
    return a[i];
    if(i==0 && j<N && b[j]b[j-1])
    return a[i];
    if(i==M && b[j]>a[i-1])
    return b[j];
    if(i<M && j<N)
    {
    if(a[i]=M)
    {
    k=k-tj-1;
    n=n-tj-1;
    J=j+1;
    }
    else
    {
    k=k-ti-1;
    m=m-ti-1;
    I=i+1;
    }
    }
    }

    int main()
    {
    int a[]={1,2,3};
    int b[]={4};
    int m=3,n=1,k=3;
    printf(“%d”,findKthsmallest(a,m,b,n,k));
    return 0;
    }

    VA:F [1.9.22_1171]
    0
  18. Hi 1337, there is one bug for your solution. How is it going if k = m+n? I think we need special handling for this edge case.

    VA:F [1.9.22_1171]
    0
    • There is another bug when m == 1 or n == 1, in this case, we need special handling to adjust i or j, here is the code snippet I have modified:

      int findKthElement(int A[], int m, int B[], int n, int k)
      {
      if ( m == 0 && n == 0 || k == 0 || k > m + n)
      {
      return -1;
      }
      else if( m == 0)
      {
      return B[k - 1];
      }
      else if (n == 0)
      {
      return A[k - 1];
      }
      else if(k == m + n)
      {
      return std::max(A[m - 1], B[n -1]);
      }
      else if( k == 1)
      {
      return std::min(A[0], B[0]);
      }

      int i = (int)(double(m)/(m+n)*(k – 1) + 0.5);
      int j = k-1-i;
      if (i == m)
      {
      i–;
      j++;
      }
      else if (j == n)
      {
      i++;
      j–;
      }

      // A[i-1], B[j-1]
      int A_i_1 = ( i == 0)? INT_MIN : A[i - 1];
      int B_j_1 = ( j == 0)? INT_MIN : B[j - 1];

      if (A[i] >= B_j_1 && A[i] <= B[j])
      {
      return A[i];
      }
      else if (A[i] = A_i_1 && B[j] <= A[i])
      {
      return B[j];
      }
      else if (B[j] <= A[i])
      {
      return findKthElement(A, i, B + j + 1, n – j – 1, k – j – 1);
      }
      return -1;
      }

      double findMedianSortedArrays(int A[], int m, int B[], int n) {

      if (m + n & 1)
      {
      return findKthElement(A, m, B, n, (m + n + 1)/2);
      }
      else
      {
      return (findKthElement(A, m, B, n, (m + n)/2) + findKthElement(A, m, B, n, (m + n)/2 + 1))*1.0/2;
      }
      return -1;
      }

      VA:F [1.9.22_1171]
      0
  19. Can we not do in log k?

    have 2 pointers.. i and j for array 1 and array 2..

    while (i+j != k) {
    if (array1[i] > array2[j]) {
    i = i + [(k-(i+j))/2]
    } else {
    j = j + [(k-(i+j))/2]
    }
    }

    return min of (array[i] , array [j])

    validate this one..

    VA:F [1.9.22_1171]
    0
    • Siva, initially n=100000, m=10;
      suppose k=1000;
      i=0;j=0;
      arrray1[]={ 1,2,3,4….}
      array2[]={100901,1212}’
      Now, as array1[i=0] j is set to 10000. Seg fault

      VA:F [1.9.22_1171]
      0
      • you are correct.. I missed to have bound check on both i and j.. if we do that.. we can do this solve this in log k rite?

        VA:F [1.9.22_1171]
        0
  20. Again , fantastic work , a tried to get a solution to this problem on a lot of places on the net but all of them were either wrong or lacked explanation , your code works fine and the explanation is great !
    just one thing , you can include these tests :
    m = min ( k , m ) ;
    n = min ( k , n ) ;
    a slight improvement .
    and if you use
    if ( ai >= bj_1 && ai <= bj ) //greater than equal to , in place of greater than
    the code will work for duplicate values as well ( provided values of i and j are checked )

    Thank You sooo much :) :)

    VA:F [1.9.22_1171]
    0
  21. was wondering why you initialized i this way, why not i from 0?

    int i = (int)((double)m / (m+n) * (k-1));
    int j = (k-1) – i;”

    VA:F [1.9.22_1171]
    0
  22. I just wrote another version with the same log complexity,
    and verified with a few inputs via your coding panel:)

    Any comments?

    // Type your C++ code and click the “Run Code” button!
    // Your code output will be shown on the left.
    // Click on the “Show input” button to start entering input data.
    #include
    using namespace std;

    int find_k_small(int a[], int m, int b[], int n, int k, int &kmin)
    {
    if (k m+n)
    return -1;

    int i = (m * (k-1))/(m+n);
    int j = k – 2 – i;

    while (1) {
    if ((j == n-1 || a[i] < b[j+1]) &&
    (i == m-1 || b[j] b[j]) ? a[i]: b[j];
    return 1;
    } else if (j = b[j+1]) {
    i /= 2;
    j = k -2 -i;
    } else {
    j /= 2;
    i = k -2 -j;
    }
    }
    return 0;
    }

    int main() {
    // Start typing your code here…
    int kmin = 0;
    int a[] = {1,3, 9, 11, 13, 15, 23};
    int b[] = {2, 4, 6, 16, 18, 20};
    int ret = find_k_small(a, sizeof(a)/sizeof(int),
    b, sizeof(b)/sizeof(int),
    9, kmin);
    cout <<"ret="<<ret<<",kmin="<<kmin<<endl;
    return 0;
    }

    VA:F [1.9.22_1171]
    0
  23. Are both arrays sorted in increasing order or decreasing order….
    if they are sorted in increasing order ,How is this claim true??

    We make an observation that when Ai < Bj, then it must be true that Ai < Bj-1. On the other hand, if Bj < Ai, then Bj < Ai-1.

    i may sound stupid but i dont get this claim

    VA:F [1.9.22_1171]
    0
  24. The i and j are not the middle of the array, then how can you prove that the algorithm will be O(lgm + lgn) or less? Thanks.

    VA:F [1.9.22_1171]
    0
  25. Hi,
    Can some one explain me why ‘i’ value is calculated as below

    int i = (int)((double)m / (m+n) * (k-1));

    Thanks,
    siv

    VA:F [1.9.22_1171]
    0
    • that’s for to stay inside of the array. if m < (k-1) /2 i will be out of array bounds

      VA:F [1.9.22_1171]
      0
  26. why set the ratio i/j= m/n , I prefer i=j, in other words, i=j= (k-1)/2
    Since the next recursion k-> k-i or k->k-j if i=j then we can optimize for the worst case. And yes I know both ratios work the same in average cases.

    Does i/j=m/n has a special objective?

    VA:F [1.9.22_1171]
    0
    • that’s for to stay inside of the array. if m < (k-1) /2 i will be out of array bounds

      VA:F [1.9.22_1171]
      0
  27. The post says

    I don’t agree. If A is [1,2,3], B is [4,5,6], then this statement is not true.

    VA:F [1.9.22_1171]
    0
  28. I solved this using a similar approach in O(lgn), see here for an explanation.

    VA:F [1.9.22_1171]
    0
  29. Hi 1337c0d3r.
    I am having trouble understanding your code.
    1) You mention that the k is found if Bj-1 < Ai < Bj AND the invariant i+j+1==k holds true.
    But I don’t see how you check for the invariant in your code.
    You just return if for the current i,j the condition Bj-1 < Ai < Bj (or its reverse for B) holds true.How do you check for the invariant?
    2) Also your code does not do what you describe in the code.If Ai<Bj you do a recursive call and you use j as the size of the array.So you just don't neglect a portion as you say.
    3)For A={1,2,6} and B={3,4,5} and k = 2, the code ends up with division by 0 when calculating i since in the first entry of the method i=0 and j=1 and so you dont' find k and you hit the Ai<Bj and you call the method recursevily with k-i-1 which is 2-0-1=1 so in the calculation of i you divide by zero.
    This is a very interesting problem and perhaps I didn't get your solution, so if you please took the time to read my comments and help me understand this I would be greatful. Thank you

    VA:F [1.9.22_1171]
    +1
    • 1) Invariant is checked when initializing j.
      j = (k-1) -i
      so it is obvious that i+j = k-1
      2) Make a trace of the code. Size of the array B with its upper part discarded is exactly j
      3) No, no it doesn’t

      VN:F [1.9.22_1171]
      0
  30. Narrow bounding method.
    Assume we have two arrays A (A.size() == n) and B (B.size() == m) and k – target value.
    In case of n<=0 or m= 1.
    Get median element of A (A[n/2]) and find position in B, where we can insert such element, let position be i, such that:
    - i==0 and B[i] > A[n/2];
    - i==m and B[i-1] < A[n/2];
    - 0 < i < m and B[i-1] < A[n/2] k, then we start procedure with A[0..n/2-1], B[0..i-1]
    3. i+n/2+1 < k, then we start procedure with A[n/2+1..n-1], B[i+1..m-1], k -= i+n/2+1

    try to code:

    Is it correct?

    VA:F [1.9.22_1171]
    0
    • Strange code formatting:

      /// Assume k start from 1
      int findKth (int *A, int n, int *B, int m, int k)
      {
      /// trivial border case (don’t forget k started from 1)
      if (n <= 0) return B[k-1];
      if (m k)
      {
      return findKth (A, n/2, B, i, k);
      }
      return findKth (&A[n/2+1], n – n/2 -1, &B[i], m – i, k – (i + n/2 + 1));
      }

      VA:F [1.9.22_1171]
      0
  31. Hi Genius!

    What if arrays are unsorted? Do you have a good solution for that?

    Thanks!

    VN:F [1.9.22_1171]
    0
  32. vk said on May 12, 2012

    i found martin’s explanation slightly off …what if there are duplicates
    thus i implemented the same concept( using 2 ptrs )which actually worked for me and it handles duplication as well…cheers

    int kthsmallest(int *A,int *B,int k)
    {
    int i=0,j=0;
    k=k-1;
    while(1)
    {
    if(A[i]B[j])
    { if(i+j==k)return B[j];
    j++;
    }
    else //(A[i]==B[j])
    {if(i+j==k)return A[i];
    i++;j++;k++;
    }
    }
    }

    VA:F [1.9.22_1171]
    0
  33. Why “We make an observation that when Ai < Bj, then it must be true that Ai < Bj-1. On the other hand, if Bj < Ai, then Bj < Ai-1" ?

    Ai can be between Bj and Bj-1, right?

    VN:F [1.9.22_1171]
    0
  34. actually, we can initially bound m and n to be smaller than k.
    m=min(m,k);
    n= min(n,k);
    then the time complexity is O(log(k))

    VN:F [1.9.22_1171]
    0
  35. Hai said on July 3, 2012

    Seems original code can not handle the case if A and B doesn’t overlap, e.g:
    A= {2 , 4} B={ 7, 9, 11, 13}, will has assertion fail.

    VA:F [1.9.22_1171]
    0
  36. int first=0,second=0,k,min;
    scanf(“%d”,&k);
    while((k–) > 0)
    {
    if (b[second]<a[first])
    { min=b[second];second++;}
    else{
    min=a[first];
    first++;}}

    this is working code

    VN:F [1.9.22_1171]
    0
  37. Thanks :)
    Median of 2 sorted array is special case of this one where k = (m+n)/2

    VA:F [1.9.22_1171]
    0
  38. by comparing middle elements of A and B, which we identify as Ai and Bj. If Ai is between Bj and Bj-1, we have just found the i+j+1 smallest element.

    I think instead of Bj-1 it should be Bj+1

    VN:F [1.9.22_1171]
    0
  39. can any one please tell me why pivot i is choosen as m/(m+n)*(k-1) … i didnt get the logic for that…

    VA:F [1.9.22_1171]
    +2
  40. hey I don’t understand why the element is i+j+1 th smallest.

    Ai has i-1 elements ahead of it in the original array and j-1 elements in the second array . Thus , it is i+j-1 th element I think.

    VA:F [1.9.22_1171]
    0
    • I actually have the same question. Have you figured out?

      VN:F [1.9.22_1171]
      0
    • Here indexing is done 0 based. So Ai has i elements before it and Bj_1 has j elements before it (+1 including it ,so j+1 elemtns less than equal to it). So it is i+j+1

      VA:F [1.9.22_1171]
      +1
  41. @1337c0d3r, what if the arrays are constant arrays of the same value? I think

    if (Bj_1 < Ai && Ai < Bj)
    return Ai;
    else if (Ai_1 < Bj && Bj < Ai)
    return Bj;

    can be


    if (Bj_1 <= Ai && Ai <= Bj)
    return Ai;
    else if (Ai_1 <= Bj && Bj <= Ai)
    return Bj;

    VN:F [1.9.22_1171]
    0
  42. You really should provide the reference where you get these solutions. This article is almost identical to the solution provided in Algorithms for Interviews book, exercise 1.18. It lays the three solutions in the same sequence and with similar explanations.

    For the sake of fairness, don’t pretend these are your bright ideas.

    However I congratulate you on the quality of the narrative and code examples.

    VA:F [1.9.22_1171]
    0
  43. log(m+n) solution without recursive:

    VN:F [1.9.22_1171]
    0
    • here it is:

      public int findKthLargest(int A[], int B[], int k) {
      int n = A.length;
      int m = B.length;

      if (k m + n)
      return -1;
      int al = 0, ar = n – 1;
      int bl = 0, br = m – 1;

      while (true) {
      n = ar – al + 1;
      m = br – bl + 1;
      int i = Math.max(0, (int) ((1.0 * n * (k – 1)) / (m + n)));
      int j = k – 1 – i;
      i += al;
      j += bl;
      if (al > ar)
      return B[j];

      if (bl > br)
      return A[i];

      int ai = (i > ar) ? Integer.MAX_VALUE : A[i];
      int ai_1 = (i br) ? Integer.MAX_VALUE : B[j];
      int bj_1 = (j = bj_1 && ai = ai_1 && bj <= ai) {
      return bj;
      }

      if (ai < bj) {
      k -= i – al + 1;
      al = i + 1;
      br = j;
      } else {
      k -= j – bl + 1;
      ar = i;
      bl = j + 1;
      }
      }
      }

      VN:F [1.9.22_1171]
      0
  44. The author gave a very good point in providing the relationship between i,j and k: i+j = k-1. And I believe the best algorithm should have the time complexity of O(log(min(m,n,k)). Simple proof:
    Suppose m>=n,
    case 1: k=m, since m+n>=k, we can flip to find the m+n-k’th largest element. and m+n-k<=n. So same as case 1.
    case 3: n<k<m,
    case 3a: if n = O(k), no problem.
    case 3b: if n = o(k), we can use the bisection method for the small array. And it's not hard to find the k's smallest element in O(n).

    VN:F [1.9.22_1171]
    0
  45. This is the C# code with time complexity O(min(m, n, k)). All possible cases were covered.

    VN:F [1.9.22_1171]
    0
  46. Really nice post. But I do have 1 questions.
    I am not convienced that 3rd solution O(lg m + lg n) is *always* better than 2nd solution O(k).
    I do agree that O(log m) is better than O(m), but I do not agree that O(log m) is *always* better than O(k).
    If k = 3, m = 100000, n = 100000
    Using the 2nd solution, you would only need to traverse 3 steps to get the answer
    Using the 3rd solution, in the worst case, you would need to traverse log(100000) + log(100000).
    It is obvious that 3 < log(100000) + log(100000).
    In this case, it is obvious that 2nd solution would run faster.
    Am I correct?

    VN:F [1.9.22_1171]
    0
    • My algorithm always ensures the time complexity is O(lg(min(m,n,k))).
      Assuming m>=n, so there are 3 possibilities:
      1. k<=n,
      we only consider A[0..k-1] and B[0..k-1]. So it is O(lg(k))

      2. n<km
      since k smallest is the same as m+n+1-k largest, and value of m+n+1-k is always <= n, Set m+n+1-k = p. We only need to consider the interval A[m-p, m] and B[n-p, n]. And the time complexity is O(lg(p)), i.e., O(lg(n)).

      VN:F [1.9.22_1171]
      0
  47. c++ code. It works fine in visual studio, but failed on leetcode because leetcode doesn’t support time class.

    VN:F [1.9.22_1171]
    0
  48. Can we extend this to find the median of two sorted Arrays(with unique numbers) ?

    VN:F [1.9.22_1171]
    0
  49. i implemented the O(log(min(m,n,k)) algorithm,here is the code below , go to tanliboy ‘s post for detail.

    VA:F [1.9.22_1171]
    +2
  50. Hi 1337c0d3r,

    I have a question on the recursive call you make at the end.

    if (Ai < Bj)
    // exclude Ai and below portion
    // exclude Bj and above portion
    return findKthSmallest(A+i+1, m-i-1, B, j, k-i-1);

    The first arguement we pass for this function is a array (int[] A). Then, How do you pass A+i+1 is the recursive call. ?
    I understand, you only trying to pass the part of array A from a position beyond i. But A+i+1 refers to an element and not the array.

    Thank You in advance.

    VA:F [1.9.22_1171]
    0
  51. The best solution is O(lgK), not O(lgM+lgN). Usually lgM + lgN may be much bigger than lgK.
    And O(lgk) solution is not hard. but a good and robust implementation is non-trivial.

    VA:F [1.9.22_1171]
    0
  52. Can you please clarify how you came up with the formula for i?
    int i = (int)((double)m / (m+n) * (k-1));

    VN:F [1.9.22_1171]
    0
  53. very nicely explain, thank you.

    VA:F [1.9.22_1171]
    0
  54. VN:F [1.9.22_1171]
    0
  55. Here is a simpler O(log(k)) solution:

    VN:F [1.9.22_1171]
    0
    • Don’t know why two lines are missing. Working code:

      VN:F [1.9.22_1171]
      0
    • Still missing two lines. Plain text as below:

      public double findKthInSortedArrays(int A[], int B[], int k) {
      if (A == null) A = new int[0];
      if (B == null) B = new int[0];
      int nA = A.length;
      int nB = B.length;
      if (nA == 0 && nB == 0) return Integer.MIN_VALUE;

      int l = k – Math.min(k, nB);
      int r = Math.min(k, nA);

      while(l <= r) {
      int i = l + (r – l) / 2;
      int j = k – i;

      int a_i = (i 0) ? A[i-1] : Integer.MIN_VALUE;
      int b_j = (j 0) ? B[j-1] : Integer.MIN_VALUE;

      if (a_i >= b_j_prev && b_j >= a_i_prev) {
      return Math.max(a_i_prev, b_j_prev);
      }

      if (a_i < b_j_prev) {
      l = i + 1;
      } else {
      r = i – 1;
      }
      }

      return Integer.MIN_VALUE;
      }

      VN:F [1.9.22_1171]
      0
    • So sorry for the messed up code. Seems that content between ” are omitted automatically.

      VN:F [1.9.22_1171]
      0
  56. Hi! Firstly, it’s a great algorithm, well done. And now, I could use some help. I need to modify this algorithm to work with duplicate elements with keeping complexity O(log(m)+log(n)). For example if A={2,4,5,8} and B={1,2,5,9,10} and I want to find 4 smallest element, my answer should be 5 (union should be {1,2,4,5,8,9,10}). I would be grateful for all help.

    VA:F [1.9.22_1171]
    0
  57. “We make an observation that when Ai < Bj, then it must be true that Ai < Bj-1. On the other hand, if Bj < Ai, then Bj < Ai-1. Why?"

    Shouldn't it be '<=' ?

    "We make an observation that when Ai <= Bj, then it must be true that Ai < Bj-1. On the other hand, if Bj < Ai, then Bj <= Ai-1. Why?"

    Thank you!

    VA:F [1.9.22_1171]
    0
  58. Martin’s idea is great, but the algo has bugS,
    first,
    if (a_offset + b_offset == k) return A[a_offset]; should be
    if (a_offset + b_offset == k) return A[a_offset -1];

    second,
    after a_offset++, need check if outofboundary,
    try {1},{2}, k=1 and see 1st bug,
    try {1},{2}, k=2 and see 2nd bug,

    VA:F [1.9.22_1171]
    0
  59. This iterative code seems to be working for me with O (log k) complexity. Can someone verify?

    VA:F [1.9.22_1171]
    0
  60. Have you tried this test case?

    A[] = {1,2,3,4,5};
    B[] = {6,7};
    and k = 7;

    VN:F [1.9.22_1171]
    0
  61. We make an observation that when Ai < Bj, then it must be true that Ai < Bj-1. On the other hand, if Bj < Ai, then Bj < Ai-1. W
    this doesnt work for this case:

    array A {1,2,3,7,12,15}
    array B {0,4,5,6,9}

    in this case middle of A is 7 and B is 5. and the condition that Bj < Ai then Bj< Ai-1 should be true doesn't work.

    Am I right or wrong?

    VA:F [1.9.22_1171]
    0
  62. I do not even know how I ended up here, but I thought this post was great.
    I do not know who you are but certainly you’re going to a famous blogger
    if you aren’t already ;) Cheers!

    VA:F [1.9.22_1171]
    0
  63. not so neat

    VN:F [1.9.22_1171]
    0
  64. i can’t find 3rd smallest element using code with this test case

    findKthSmallest([1, 3, 4, 7, 9], 5, [2, 3, 4, 7], 4, 3)

    because there is no condestion about if Ai==Bj

    Thanks

    VA:F [1.9.22_1171]
    0
  65. I have a better solution. Run tim is O(lgk). O(lgk) is better than O(lgm+lgn) since k<m*n

    VA:F [1.9.22_1171]
    +1
  66. I just checked ZhigangZhao’s solution. Then I found mine previous solution have out of bound exception.

    Here’s my new solution:

    Test:

    ZhigangZhao said its run time is O(log(min(m,n,k)) . I don’t think so. I think its run time is O(logk).

    VA:F [1.9.22_1171]
    +1
  67. Just want to add a comment to further explain how to slit those two arrays.
    When Ai doesn’t fall between Bj-1 and Bj, it means that Ai is not the i+j+1th smallest element. There might be an element k (0 <= k <= j-1) that satisfy Bk-1 < Ai < Bk, so Ai is actually the i+k+1th smallest element, which is smaller than i+j+1. This meaning, the element we are looking for must be greater than the current i if its' in array A. Since we also have the invariant i+j=k-1, if we increase i in array A, we have to decrease j in array B, that's why we discard A0 to Ai and also Bj to Bn.

    VN:F [1.9.22_1171]
    0
  68. In the given line above you need to correct the conditions, if Ai < Bj then Ai < B(j+1) and next expression as feel

    Errorenous Line:
    We make an observation that when Ai < Bj, then it must be true that Ai < Bj-1. On the other hand, if Bj < Ai, then Bj < Ai-1. Why?

    VA:F [1.9.22_1171]
    0
  69. Here’s my solution:

    VA:F [1.9.22_1171]
    0
  70. Hi ,
    I did not go through all the comments above but if any of you has already mentioned my solution, pls let me know if it would work .

    According to the problem , we know m, n and k so we can guess from this that k< (m+n) then we can find out if k< (m+n)/2 and keep dividing by 2 again to check if k<(m+n)/4 .

    Suppose if, k (m+n)/4 then we get an interval

    (m+n)/4 <k (m/4)+(n/4) <k < (m/2)+(n/2)

    so now we can set the starting pointer of A to position(m/4) and search till (m/2) and similarly for B.

    The time complexity wld be much better that O(k) , according to my calculations, i believe it would be O(log m +log n) similar to the last method mentioned in the explanation.

    VA:F [1.9.22_1171]
    0
    • Sorry for the error,

      I meant

      (m+n)/4 <k < (m+n)/2
      =(m/4)+(n/4) <k < (m/2)+(n/2)

      VA:F [1.9.22_1171]
      0
  71. Do we consider in the (log m + log n) solution that the K is less than both m and n? if it is so then we can have a solution with (log k) complexity

    VA:F [1.9.22_1171]
    0
  72. Hi 1337c0d3r, your solution seems to be right. Thank you for such insightful explanation. I was wondering if there is O(log(K)) , that works for all cases. No limitation like length of 1st array should be less than the 2nd or some other limitation.

    Help is appreciated. Thanks.

    VA:F [1.9.22_1171]
    0
  73. package kthsmallest;
    import java.lang.Math;

    /**
    *
    * @author xz2210
    */
    public class Kthsmallest {

    /**
    * @param args the command line arguments
    */
    public static void main(String[] args) {
    // TODO code application logic here
    int[] A={1, 2, 5, 10, 12, 14, 19, 282, 290};
    int[] B={3, 4, 6, 9, 281, 289};
    int k0 = 13;
    Kthsmallest obj = new Kthsmallest();
    System.out.println(obj.Kthsmallest(A, 0, Math.min(A.length-1, k0-1), B, 0, Math.min(B.length-1, k0-1), k0));
    }

    public int Kthsmallest(int[] X, int xlb, int xub, int[] Y, int ylb, int yub, int k){
    if(xub + yub <k-2) return 0;
    if(xub + yub == k-2) return Math.max(Y[yub], X[xub]);
    if(X.length==0){
    return Y[k-1];
    }
    if(Y.length==0){
    return X[k-1];
    }

    int xpiv = (int)Math.ceil((xlb+xub)/2);
    xpiv = Math.min(xpiv, k-2);
    xpiv = Math.max(xpiv, k-1-Y.length);
    int ypiv = k – xpiv – 2;

    // Constri
    // 0=<k-xpiv-2<=Y.length -1
    // xpiv=k-1-Y.length
    // k-1-Y.length Y.length>=1

    if (X[xpiv]Y[ypiv])
    return Y[ypiv];
    else
    return Kthsmallest(X, xpiv, xub, Y, ylb, ypiv, k);
    }

    //if (X[xpiv]>Y[ypiv])
    else{
    if(X[xpiv]<Y[ypiv+1])
    return X[xpiv];
    else
    return Kthsmallest(X, xlb, xpiv, Y, ypiv, yub, k);
    }
    }
    }

    VN:F [1.9.22_1171]
    0
  74. VN:F [1.9.22_1171]
    0
  75. VN:F [1.9.22_1171]
    0
  76. VN:F [1.9.22_1171]
    0
  77. package kthsmallest;
    import java.lang.Math;

    public class Kthsmallest {

    /**
    * @param args the command line arguments
    */
    public static void main(String[] args) {
    // TODO code application logic here
    int[] A={1, 2, 5, 10, 12, 14, 19, 282, 290};
    int[] B={3, 4, 6, 9, 281, 289};
    int k0 = 13;
    Kthsmallest obj = new Kthsmallest();
    System.out.println(obj.Kthsmallest(A, 0, Math.min(A.length-1, k0-1), B, 0, Math.min(B.length-1, k0-1), k0));
    }

    //refine the search range of X, Y for the kth smallest number
    //xlb: lower bound of X
    //xub: upper bound of X
    //ylb: lower bound of Y
    //yub: upper bound of Y
    public int Kthsmallest(int[] X, int xlb, int xub, int[] Y, int ylb, int yub, int k){
    //If the total # of elements in X, Y is less than k, invalid
    if(xub + yub <k-2) return 0;
    //If the total # of elements match with k, return the bigger number of X's or Y's last element
    if(xub + yub == k-2) return Math.max(Y[yub], X[xub]);
    //Otherwise, if X is empty, return Y[k-1]
    if(X.length==0){
    return Y[k-1];
    }
    if(Y.length==0){
    return X[k-1];
    }

    //Pick the new pivot in X, Y to refine the search range based on binary search concept
    int xpiv = (int)Math.ceil((xlb+xub)/2);
    //Constraints for ypiv calculated based on xpiv
    //0=<k-xpiv-2<=Y.length -1
    //xpiv=k-1-Y.length
    //so, in general, k-1-Y.length Y.length>=1
    //refine xpiv based on constraints above
    xpiv = Math.min(xpiv, k-2);
    xpiv = Math.max(xpiv, k-1-Y.length);
    int ypiv = k – xpiv – 2;

    if (X[xpiv] Y[ypiv])
    return Y[ypiv];
    else
    return Kthsmallest(X, xpiv, xub, Y, ylb, ypiv, k);
    }
    //if (X[xpiv]>Y[ypiv])
    else{
    if(X[xpiv]<Y[ypiv+1])
    return X[xpiv];
    else
    return Kthsmallest(X, xlb, xpiv, Y, ypiv, yub, k);
    }
    }
    }

    VN:F [1.9.22_1171]
    0
  78. VN:F [1.9.22_1171]
    0
  79. I solved it with 3 lines only … The algorithm works fine and with the best complexity as you mentioned, the problem might be in the stack size as it is a recursive algorithm.
    I am assuming that:
    1. k is always bounded between 0 and size of first array + size of second array
    2. arrays are full with data, i.e. no null array or empty array passed.

    VN:F [1.9.22_1171]
    0
  80. This can be made even more efficient as you only need to look into first k elements of either array instead of m and n, so the complexity would be (O(lg k + lg k))

    VA:F [1.9.22_1171]
    0

Leave a reply

Your email address will not be published. Required fields are marked *

You may use the <code> tag to embed your code.