Median of Two Sorted Arrays
March 28, 2011 in Uncategorized
There are two sorted arrays A and B of size m and n respectively. Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)).
Note:
Please head over to this MIT handout for a much better solution, although their solution does not deal with some special cases, which is easy to fix. Please consult Sophie’s solution which fixes these special cases easily. Although my solution below works, it is too complicated.
Online Judge
This problem is available at Online Judge. Head over there and it will judge your solution. Currently only able to compile C++ code. If you are using other languages, you can still verify your solution by looking at the judge’s test cases and its expected output.
Solution:
If you search this problem on Google, you will find tons of hits. However, most of them deal with the special case where m == n, and even so their code are filled with bugs. The CLRS book has this problem as exercise in section 9.3-8, however it also assumes the case where m == n. The only reliable solution I found on the web which deals with the generic case also seemed incorrect, as their definition of the median is the single middle element (although their approach of using binary search is pretty neat). According to the definition of the median, if (m + n) is even, then the median should be the mean of the two middle numbers.
If you read my previous post: Find the k-th Smallest Element in the Union of Two Sorted Arrays, you know that this problem is somewhat similar. In fact, the problem of finding the median of two sorted arrays when (m + n) is odd can be thought of solving the special case where k=(m+n)/2. Although we can still apply the finding k-th smallest algorithm twice to find the two middle numbers when (m + n) is even, it is no more a desirable solution due to inefficiency.
You might ask: Why not adapt the previous solution to this problem? After all, the previous algorithm solves a more general case. Well, I’ve tried that and I didn’t consider the previous solution is easily adaptable to this problem. The main reason is because when (m + n) is even, the two middle elements might be located in the same array. This complicates the algorithm and many special cases have to be dealt in a case by case basis.
Similar to finding the k-th smallest, the divide and conquer method is a natural approach to this problem. First, we choose Ai and Bj (the middle elements of A and B) where i and j are defined as m/2 and n/2. We made an observation that if Ai <= Bj, then the median must be somewhere between Ai and Bj (inclusive). Therefore, we could dispose a total of i elements from left of Ai and a total of n-j-1 elements to the right of Bj. Please take extra caution not to dispose Ai or Bj, as we might need two middle values to calculate the median (it might also be possible that the two middle values are both in the same array). The case where Ai > Bj is similar.
Two sorted arrays A and B. i is chosen as m/2 and j is chosen as n/2. Ai and Bj are middle elements of A and B. If Ai < Bj, then the median must be between Ai and Bj (inclusive). Similarly with the opposite.
The main idea illustrated above is mostly right, however there is one more important invariant we have to maintain. It is entirely possible that the number of elements being disposed from each array is different. Look at the example above: If Ai <= Bj, two elements to the left of Ai and three elements to the right of Bj are being disposed. Notice that this is no longer a valid sub-problem, as both sub-array’s median is no longer the original median.
Therefore, an important invariant we have to maintain is:
The number of elements being disposed from each array must be the same.
This could be easily achieved by choosing the number of elements to dispose from each array to be (Warning: The below condition fails to handle an edge case, for more details see the EDIT section below):
k = min(i, n-j-1) when Ai <= Bj. <--- 1(a) k = min(m-i-1, j) when Ai > Bj. <--- 1(b)
Figuring out how to subdivide the problem is actually the easy part. The hard part is figuring out the base case. (ie, when should we stop subdividing?)
It is obvious that when m=1 or n=1, you must treat it as a special base case, or else it would end up in an infinite loop. The hard part is reasoning why m=2 or n=2 requires special case handling as well. (Hint: The two middle elements might be in the same array.)
Finally, implementing the above idea turns out to be an extremely tricky coding exercise. Before looking at the solution below, try to challenge yourself by coding the algorithm.
If you have a more elegant code to this problem, I would love to hear from you!
EDIT:
Thanks to Algorist for being the first person who points out a bug. (For more details, read his comment). The bug is caused by some edge cases that are not handled in the base case.
Shortly after I fixed that bug, I discovered another edge case myself which my previous code failed to handle.
An example of one of the edge cases is:
A = { 1, 2, 4, 8, 9, 10 }
B = { 3, 5, 6, 7 }The above conditions ( 1(a), 1(b) ) fails to handle the above edge case, which returns 5 as the median while the correct answer should be 5.5.
The reason is because the number 5 is discarded in the first iteration, while it should be considered in the final evaluation step of the median. To resolve this edge case, we have to be careful not to discard the neighbor element when its size is even. Here are the corrected conditions ( 2(a), 2(b), 2(c), 2(d) ) for k which resolves this edge case.
k = min(i-1, n-j-1) when Ai <= Bj and m is even. <--- 2(a) k = min(i, n-j-1) when Ai <= Bj and m is odd. <--- 2(b) k = min(m-i-1, j-1) when Ai > Bj and n is even. <--- 2(c) k = min(m-i-1, j) when Ai > Bj and n is odd. <--- 2(d)
Below is the bug-free code after going through a lengthy rigorous testing of all possible edge cases. (Not for the faint of heart!)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 | double findMedianBaseCase(int med, int C[], int n) { if (n == 1) return (med+C[0])/2.0; if (n % 2 == 0) { int a = C[n/2 - 1], b = C[n/2]; if (med <= a) return a; else if (med <= b) return med; else /* med > b */ return b; } else { int a = C[n/2 - 1], b = C[n/2], c = C[n/2 + 1]; if (med <= a) return (a+b) / 2.0; else if (med <= c) return (med+b) / 2.0; else /* med > c */ return (b+c) / 2.0; } } double findMedianBaseCase2(int med1, int med2, int C[], int n) { if (n % 2 == 0) { int a = (((n/2-2) >= 0) ? C[n/2 - 2] : INT_MIN); int b = C[n/2 - 1], c = C[n/2]; int d = (((n/2 + 1) <= n-1) ? C[n/2 + 1] : INT_MAX); if (med2 <= b) return (b+max(med2,a)) / 2.0; else if (med1 <= b) return (b+min(med2,c)) / 2.0; else if (med1 >= c) return (c+min(med1,d)) / 2.0; else if (med2 >= c) return (c+max(med1,b)) / 2.0; else /* a < med1 <= med2 < b */ return (med1+med2) / 2.0; } else { int a = C[n/2 - 1], b = C[n/2], c = C[n/2 + 1]; if (med1 >= b) return min(med1, c); else if (med2 <= b) return max(med2, a); else /* med1 < b < med2 */ return b; } } double findMedianSingleArray(int A[], int n) { assert(n > 0); return ((n%2 == 1) ? A[n/2] : (A[n/2-1]+A[n/2])/2.0); } double findMedianSortedArrays(int A[], int m, int B[], int n) { assert(m+n >= 1); if (m == 0) return findMedianSingleArray(B, n); else if (n == 0) return findMedianSingleArray(A, m); else if (m == 1) return findMedianBaseCase(A[0], B, n); else if (n == 1) return findMedianBaseCase(B[0], A, m); else if (m == 2) return findMedianBaseCase2(A[0], A[1], B, n); else if (n == 2) return findMedianBaseCase2(B[0], B[1], A, m); int i = m/2, j = n/2, k; if (A[i] <= B[j]) { k = ((m%2 == 0) ? min(i-1, n-j-1) : min(i, n-j-1)); assert(k > 0); return findMedianSortedArrays(A+k, m-k, B, n-k); } else { k = ((n%2 == 0) ? min(m-i-1, j-1) : min(m-i-1, j)); assert(k > 0); return findMedianSortedArrays(A, m-k, B+k, n-k); } } |
EDIT2:
A reader buried.shopno had managed to code the solution more elegantly! I especially like how medianOfThree and medianOfFour were implemented. For more details, read his comment below. Great job!
Further thoughts:
A reader nimin98 suggested that the base case can be handled by simply doing a direct merge. In other words, we have to merge the short array (containing either one or two elements) with the longer array (pick the four elements near the middle. Deciding which four is another tricky business because of multiple special cases). nimin98′s code has few bugs in the handling of base case.
In general, The above approaches (including mine) to handle the base case are not recommended due to tricky implementation. How about Binary Search? We can use binary search to find the correct position to insert elements from the shorter array into the longer array, thus completing the merge (You don’t have to *actually* insert it, recording its index should be suffice).
Median of Two Sorted Arrays,
Sachin said on March 28, 2011
something is missing in code above, in the snippet below:
int i = m/2, j = n/2;
if (A[i] 0);
return findMedian(A+k, m-k, B, n-k);
} else {
int k = min(m-i-1, j);
assert(k > 0);
return findMedian(A, m-k, B+k, n-k);
}
the condition if (A[i] 0); is incomplete and also, ‘k’ is not defined in the call return findMedian(A+k, m-k, B, n-k);
Sachin said on March 28, 2011
also, else if statement in
// Prerequisite: a must be less than or equal to b
double medianOfThree(int a, int b, int val) {
assert(a <= b);
if (val <= a)
return a;
else if (val b */
return b;
}
is incomplete
1337c0d3r said on March 28, 2011
Thanks for pointing that out. It’s fixed now.
Sachin said on March 28, 2011
In
int i = m/2, j = n/2;
if (A[i] 0);
return findMedian(A+k, m-k, B, n-k);
k is undefined and if statement is incomplete
David said on March 28, 2011
Very cool analysis! It is actually disappointing to see so many buggy solutions online, while this is a question I often see being asked in an interview.
Good job.
SK said on March 29, 2011
Two more question to add to your queue :
1) Given two sorted positive integer arrays A[n] and B[n] , we define a set S = {(a,b) | a \in A and b \in B}. The value of such a pair is defined as Val(a,b) = a + b. Now we want to get the n pairs from S with largest values in O(n) time.
2) Consider there is an array with duplicates and you are given two numbers as input and u have to return the minimum distance between the two in the array with minimum complexity.
Algorist said on March 29, 2011
There are some bugs in this the program posted…….
For e.g. It doesn’t handles base case properly.
Arr1 = 2 elements (p, q)
Arr2 = even no. of elements(1,9,11,16,24,28,32,46)
Now p & q may occur anywhere in the list…..
For e.g.
p,q,1,9,11,16,24,28,32,46
1,9,p,q,11,16,24,28,32,46
1,9,p,11,q,16,24,28,32,46
1,9,11,p,q,16,24,28,32,46
1,9,11,p,16,q,24,28,32,46
1,9,11,16,p,q,24,28,32,46
1,9,11,16,p,24,q,28,32,46
1,9,11,16,24,p,q,28,32,46
1,9,11,16,24,p,28,q,32,46
1,9,11,16,24,28,p,q,32,46
1,9,11,16,24,28,32,46,p,q
Please check the logic that you have implemented works for this case. I have placed p & q knowingly around 11,16,24,28 in final array..
1337c0d3r said on March 29, 2011
You are absolutely correct. Congrats to Algorist for being the first reader who found the bug! I will give credit to you and correct my code in my post shortly. Thanks!
Algorist said on March 29, 2011
Also, one thing that i haven’t understood is
k = min(i, n-j-1) when Ai Bj.
May be very tired that’s why not able to get the logic behind.. Please if anybody could explain this..
Algorist said on March 29, 2011
Hey why did you deleted my comments??
1337c0d3r said on March 29, 2011
I would never delete my reader’s comments. The reason you are not seeing your comments appearing immediately is because it requires my manual approval. I will remove this restriction now, as the spam filter is working pretty good up to this point. Thanks for yoru patience.
vardhan said on March 31, 2011
hey buddy,
I have few questions to send out to you. is that fine to post them directly here or mail them to you. If I want to mail you, can you provide me your mail id?
thanks!
1337c0d3r said on March 31, 2011
I have sent you a reply via email that you provided.
nimin98 said on March 31, 2011
How about this solution; basically, for the boundary case, simply merge the two tiny arrays, any special cases will be handled automatically.
int medianTwoArray(int *arrL, int *arrR, int sizeL, int sizeR) {
if (sizeL <= 2 || sizeR <= 2) {
if (sizeL <= sizeR)
return handleBoundaryCase(arrL, arrR, sizeL, sizeR);
else
return handleBoundaryCase(arrR, arrL, sizeR, sizeL);
}
int medL = sizeL / 2;
int medR = sizeR / 2;
if (medL <= medR)
return medianTwoArray(arrL+medL, arrR, sizeL-medL, sizeR-medL);
else
return medianTwoArray(arrL, arrR+medR, sizeL-medR, sizeR-medR);
}
int handleBoundaryCase(int *arrSmall, int *arrLarge, int sizeSmall, int sizeLarge) {
int medIdx = sizeLarge/2;
// merge two small array into one
int tempArr[7];
int idxSmall = 0;
int idxLarge = 0;
int idxTemp = 0;
int *newLargeArrStart = arrLarge + max(medIdx-2, 0);
memcpy(tempArr+sizeSmall, newLargeArrStart, sizeof(int)*5);
while (idxSmall < sizeSmall && idxLarge < min(5, sizeLarge)) {
if (arrSmall[idxSmall] < newLargeArrStart[idxLarge])
tempArr[idxTemp++] = arrSmall[idxSmall++];
else
tempArr[idxTemp++] = newLargeArrStart[idxLarge++];
}
if (idxSmall < sizeSmall)
memcpy(tempArr+idxTemp, arrSmall+idxSmall, sizeof(int)*(sizeSmall-idxSmall));
return tempArr[(sizeSmall+min(sizeLarge,5))/2];
}
1337c0d3r said on March 31, 2011
This method might degrade to linear complexity. The statement “simply merge the two tiny arrays” is flawed, as the other array might not be tiny. (Imagine if you start with m = 1 and n = 1000000). A better approach to handle the base case would be to use binary search to merge the small array into the larger array, which the median is then readily obtained. I already have the bug fix and will update my post as soon as I find the time.
nimin98 said on March 31, 2011
the maximum size of the tempArr is 7; simply use the media plus minus 2 of the longer array.
johny said on March 31, 2011
Hi, 1337:
Could you provide code for finding k-th smallest element in an unsorted arrary? Googling can’t help me locate an answer with complete and correct code.
Thanks!
buried.shopno said on April 8, 2011
Hi 1337,
Great works you post on here.
I code this problem here : http://ideone.com/FtqjM
It seems quite simpler than your provided solution. I’ve tested it for different random datasets ( 1 <= N, M <= 1000). Would you mind to have a look, and let me know if you find any possible bug in it. Thanks.
1337c0d3r said on April 9, 2011
Try the following test case, which results in infinite recursion:
n = 3
A = {1 2 3}
m = 2
B = {1 2}
I do see your code to have the potential to be more elegant. Looking forward to your bug-free code.
1337c0d3r said on April 10, 2011
Hey buried.shopno,
I read your code again and had verified your solution to be correct!
Sorry I missed the part where A’s size has to be less than B’s size (and you swap the array parameters to be passed in if not the case).
Your code is definitely more elegant than mine, so this post will feature your solution (will be updated later) and I will give full credit to you
buried.shopno said on April 10, 2011
Hi 1337,
Thanks for your time to verify my program. I basically follow your approach, but code it in my way.
I wanted to avoid 2 set of base cases (for n == 1, m == 1, n == 2, m == 2), and so swap the function parameters based on array size. I had hard time to handle the base case of n == 2 and m > 2.
Sean said on July 8, 2011
Bravo, good job buried.shopno! Your code simplified the base case handling a lot. 1337 did a good job analyzing the recursion pattern.
Based on you two’s work, I made the following two further observations:
1. if you assume m <= n, we can prove through mathematical deduction that the number of elements excluded from A will always be less or equal to the number of elements that could be removed from B. So the min function calls in the recursion step can be skipped. We always pick i or m-i-1 as the number of elements to be removed.
2. Through mathematical deduction, we can also prove that A[i-1] or B[j-1] could be involved in the final calculation of median only when both m and n are even.
I have verified both the above two observations by modify buried.shopno's code.
shilcare said on April 22, 2011
Hi, 1337:
It seems that your bug-free code got an wrong answer with the following test case:
A = {1, 3, 4, 5}
B = {1, 3, 4, 6}
expected: 4
got: 3
1337c0d3r said on April 22, 2011
Thanks for your comment. (I think the expected answer should be 3.5, not 4)
I’ve just tested your input on the code above, it returns the correct answer 3.5.
Are you sure you are returning the answer as double?
shilcare said on April 23, 2011
Yes, the output is 3.5, I’m sorry.
And I think I may have misunderstood the meaning of “median” in the problem. I thought it is the middle element of the result array after merging A and B. After all, it is more reasonable to return one element of the two array than a non-existing number, isn’t it?
1337c0d3r said on April 23, 2011
I’m using the definition of median according to Wikipedia: http://en.wikipedia.org/wiki/Median
If N is even, the median should be the mean of the two middle values.
bpin said on May 11, 2011
let me try mine…
1337c0d3r said on May 16, 2011
You need to return a double type. The median of {1,2,3,4} is 2.5.
Eric-Draven said on May 18, 2012
what i dont get in your solution is why is that you are returning an element only from array m…when it possible for it to exist in any array m or n !!
Eric-Draven said on May 18, 2012
what i dont get in your solution is why is that you are returning an element only from array m…when it possible for it to exist in any array m or n !!
Eric-Draven said on May 18, 2012
my comments are meant for the code above ur comment !!
Sophie said on May 15, 2011
Hello, first, thanks for your great posts and all of them are inspiring!
About this problem, I think maybe we could still use the MIT solution with a small modification. When n+m is odd, then it’s fine; otherwise, it actually returns the greater one of the two middle numbers and the smaller one will be max(B[j], A[i-1]). My code is here. Looking forward for your comments.
double findMedian(int A[], int B[], int l, int r, int nA, int nB) {
if (l > r) return findMedian(B, A, max(0, (nA+nB)/2-nA), min(nB, (nA+nB)/2), nB, nA);
int i = (l+r)/2;
int j = (nA+nB)/2 – i – 1;
if (j ≥ 0 && A[i] < B[j]) return findMedian(A, B, i+1, r, nA, nB);
else if (j < nB-1 && A[i] > B[j+1]) return findMedian(A, B, l, i-1, nA, nB);
else {
if ( (nA+nB)%2 == 1 ) return A[i];
else if (i > 0) return (A[i]+max(B[j], A[i-1]))/2.0;
else return (A[i]+B[j])/2.0;
}
}
LB said on May 16, 2011
@Sophie: would you please give the link to MIT solution that you referred to? It seems a lot simpler than original post’s idea that deal with many special cases. Thanks.
Sophie said on May 16, 2011
http://www2.myoops.org/course_material/mit/NR/rdonlyres/Electrical-Engineering-and-Computer-Science/6-046JFall-2005/30C68118-E436-4FE3-8C79-6BAFBB07D935/0/ps9sol.pdf
Sorry. It is in the post, denoted as “the only reliable solution I found on the web”.
1337c0d3r said on May 16, 2011
Hi Sophie,
Thanks for your comment. I will look into it later, and will add this problem to online judge so you all can verify your solution soon.
1337c0d3r said on May 20, 2011
Hi Sophie,
This problem is added to the Online Judge. Feel free to head over and test your solution.
bureid.shopno said on May 20, 2011
Could you please upload the input/output files? I was getting TLE without any clue
Thanks.
1337c0d3r said on May 20, 2011
You can try submitting the solution without adding any code, and it will show you Wrong Answer. Then you could see the test cases which allows you to check your program. Let me know if you believe your program was correct but still getting TLE!
bureid.shopno said on May 21, 2011
Thanks for a good data set. After a small fix, I got it accepted. My code was getting TLE for a special case n == 0 or m == 0.
Sophie said on May 20, 2011
“Accepted!” Thanks for posting it to online judge!
LB said on May 22, 2011
Hey Sophie, would you mind to share you complete code here (or, on ideone.com). I ran your above code, which was getting too many wrong answers. I’m confused, even though it seems you follow the same procedure as mentioned in MIT’s note.
Thanks in advance.
1337c0d3r said on May 22, 2011
Here’s Sophie’s amazing solution. (Sophie I hope you won’t mind me posting your solution here). I wanna update this post with her solution but haven’t got the time to do it yet. Feel free to add comment on the solution.
double findMedian(int A[], int B[], int l, int r, int nA, int nB) {
if (l>r) return findMedian(B, A, max(0, (nA+nB)/2-nA), min(nB,
(nA+nB)/2), nB, nA);
int i = (l+r)/2;
int j = (nA+nB)/2 – i – 1;
if (j>=0 && A[i] < B[j]) return findMedian(A, B, i+1, r, nA, nB);
else if (j<nB-1 && A[i] > B[j+1]) return findMedian(A, B, l, i-1, nA, nB);
else {
if ( (nA+nB)%2 == 1 ) return A[i];
else if (i>0) return (A[i]+max(B[j], A[i-1]))/2.0;
else return (A[i]+B[j])/2.0;
}
}
double findMedianSortedArrays(int A[], int n, int B[], int m) {
if (n<m) return findMedian(A, B, 0, n-1, n, m);
else return findMedian(B, A, 0, m-1, m, n);
}
Sophie said on May 26, 2011
Sorry for the belated reply. I was out of town for an interview. 1337c0d3r has posted my complete code. The only tricky part is that my function asks for two more parameters, l and r, and thus we need a helper function and set l and r as 0 and n-1 (m-1) as needed. Please let me know if it still doesn’t work for you.
Thanks 1337c0d3r for posting my code!!
Sophie said on May 20, 2011
“Accepted!” Thanks for posting it to online judge!
1337c0d3r said on May 20, 2011
Congrats Sophie!!!
Your solution is beautiful and concise. I will update this post by featuring your solution, giving full credits to you. Great job!
lipingwu said on August 4, 2011
1337, I had the same idea with Sophie, but I implemented the idea without recursion. I put the codes into your online test (may need to delete the comments before copy them into the test) and it was accepted!
Following is my code, and can you have a look at and give some comments? Also, can I get some credits too?
//Note: The element right before the target element after combining and sorting these two arrays is not the element right before the target in the current array.
double findMedianSortedArrays(int A1[], int n1, int A2[], int n2){//Assume both A1 and A2 are not empty
//or
if(n1 == 0 && n2 == 0) throw “No element in neither array!”;
int s1 = 0, e1 = n1-1, s2 = 0, e2 = n2-1;//[s1, e1] is the searching range for A1, and [s2, e2] is the searching range for A2. In each iteration of the loop below, the target element should be in either range if it is in.
int targetNum = (n1+n2)/2;//We are going to find the element in either array that is greater than or equal to targetNum((n1+n2)/2) elements in both arraies.
while(s1 <= e1 && s2 = A2[mid2]){
if(mid1+mid2 < targetNum) s2 = mid2 + 1;
else e1 = mid1 – 1;
}else{
if(mid1+mid2 e1){//The searching range for A1 is empty now. Assume the target valude is T. Then T >= A1[s1-1]=A1[e1] and T A2[t-1]) ? A1[e1] : A2[t-1];//
}else{
t = targetNum – s2;
T1 = A1[t];
T2 = (t==0 || A2[e2]> A1[t-1]) ? A2[e2] : A1[t-1];
}
return (n1+n2)%2 ? T1 : ((double) T1+T2)/2;
}
lipingwu said on August 4, 2011
Sophie, I had the same idea with yours. Can you have a look at the solution I posted and give some comments? Thanks.
Amol said on December 24, 2011
@Sophie
That’s for an awesome implementation of the algorithm and handles very well arrays of different sizes.
Small modifications to make this look simple and will improve performance by removing else statements.
double findMedian(int A[], int B[], int l, int r, int nA, int nB) {
if (l>r) return findMedian(B, A, max(0, (nA+nB)/2-nA), min(nB,(nA+nB)/2), nB, nA);
int i = (l+r)/2;
int j = (nA+nB)/2 – i – 1;
if (j>=0 && A[i] < B[j]) return findMedian(A, B, i+1, r, nA, nB);
if (j B[j+1]) return findMedian(A, B, l, i-1, nA, nB);
if ( (nA+nB)%2 == 1 ) return A[i];
if (i>0) return (A[i]+max(B[j], A[i-1]))/2.0;
return (A[i]+B[j])/2.0;
}
double findMedianSortedArrays(int A[], int n, int B[], int m) {
if (n<m) return findMedian(A, B, 0, n-1, n, m);
return findMedian(B, A, 0, m-1, m, n);
}
Amol said on December 24, 2011
Here is ideone link for it.
http://www.ideone.com/8VgdW
guoqiang2 said on June 6, 2013
this is weird, I download your code, and change to a simple case:
and the answer of your code or Sophie’s method is:
6
which should be 5.
Am I doing anything wrong here?!
ehorse said on May 4, 2013
this won’t work for the two arrays {3} and {3}
Wenqiang said on June 9, 2013
@Sophie
Thanks for your elegant codes. It is simple and efficient. While debugging it, I found a minor bug within it. When input is like A={}, B={2, 3}, there might be array bound overflow. That is B[-1] will be evaluated. In visual studio, it seems that no exception will be thrown.
fzzh24 said on August 17, 2011
There is no need to apply the finding k-th smallest algorithm twice to find the two middle numbers when (m + n) is even.
Think this way: use find kth smallest alg to find the (m+n)/2 value,let’s say it is A[i]
B[j-1]<A[i]<B[j]
i+j-1+2=(m+n)/2
The next middle number (m+n)/2+1 is the value min(A[i+1], B[j])
japanbest said on October 13, 2011
double findKthElement(int A[], int m, int B[], int n, int k, int kplus) {
if (0 == m)
return (B[k-1]+B[kplus-1])/(double) 2;
if (0 == n)
return (A[k-1]+A[kplus-1])/(double) 2;
if (k>2)
{
int i= ((k-3)>>1) >1) : m-1;
int j=k-3-i;
if (j>=n)
{
j=n-1;
i=k-3-j;
}
if (A[i]>=B[j])
{
int bj_1 = j+1<n ? B[j+1] : INT_MAX;
if (A[i]<=bj_1)
return findKthElement(A+i+1,m-i-1,B+j+1,n-j-1,1,kplus-k+1);
else
return findKthElement(A,m,B+j+1,n-j-1,k-j-1,kplus-j-1);
}
else
{
int ai_1 = i+1<m ? A[i+1] : INT_MAX;
if (B[j]<=ai_1)
return findKthElement(A+i+1,m-i-1,B+j+1,n-j-1,1,kplus-k+1);
else
return findKthElement(A+i+1,m-i-1,B,n,k-i-1,kplus-i-1);
}
}
else
{
int kv,kplusv,up=0,down=0;
int target = kplus;
while (target–)
{
int i=(up<m)?A[up]:INT_MAX;
int j=(down<n)?B[down]:INT_MAX;
kv=kplusv;
if (i>1)+1,((m+n)>>1)+1);
else
return findKthElement(A,m,B,n,((m+n)>>1),((m+n)>>1)+1);
}
siv said on October 13, 2011
Hi Sophie,
Can you explain how did you arrive at the below wquation
if (l > r) return findMedian(B, A, max(0, (nA+nB)/2-nA), min(nB, (nA+nB)/2), nB, nA);
Sophie said on October 14, 2011
Check out the pdf I attached in the posts. They have detailed explanation. It has been a while. Or, try some real numbers, then you can have some sense.
myflamesky said on February 9, 2012
I’ve seen another popular question related: find the median of infinite stream of number.
I guess it’s an open question, anyone can please share some good ideas? Thanks!
Anantha Krishnan said on February 19, 2012
“The only reliable solution I found on the web which deals with the generic case also seemed incorrect“…..
May I know what is incorrect there?
Raj said on February 25, 2012
Cant this be solved easily using the same algorithm that was used in http://www.leetcode.com/2010/10/searching-2d-sorted-matrix-part-ii.html?
For median, we could find Kth element (for odd) or Find K, (K+1) nth element for even and calculate median. Right?
lamothy said on February 26, 2012
I use Binary search algorithm to find the median. Search the median in one array first. If it fails , try the second array.
Suppose there are two arrays a[m] and b[n].
If a[i] is the median, then it should satisfy some inequality like b[j]<=a[i]<=b[j+1] and i+j + (1or 2 ) = (m+n)/2.
There are lots of edge cases to consider. I also tested for many cases.
double findMedianSortedArrays(int A[], int m, int B[], int n) {
if(!m && !n) return 0;
if(!m){
if(n%2) return B[n/2];
else return 0.5*(B[n/2-1]+B[n/2]);
}
if(!n){
if(m%2) return A[m/2];
else return 0.5*(A[m/2-1]+A[m/2]);
}
if(m==1 && n==1) return 0.5*(A[0]+B[0]);
if(m==1) {
if(n%2){
if(A[0]<=B[n/2-1]) return 0.5*(B[n/2]+B[n/2-1]);
else if(A[0] <=B[n/2+1]) return 0.5*(A[0]+B[n/2]);
else return 0.5*(B[n/2+1]+B[n/2]);
}
else {
if(A[0]<=B[n/2-1]) return B[n/2-1];
else if (A[0]<=B[n/2]) return A[0];
else return B[n/2];
}
}
if(n==1) return findMedianSortedArrays(B,1, A, m);
if(n==2 && m==2) return 0.5*(min(A[1],B[1])+ max(A[0],B[0]));
double result=search(A, m, B, n);
if(result1){
i=(s+e)/2;
if(i>m-1) break;
j=(m+n)/2-i-(odd?1:2);
while(jA[i]){
if(odd) return A[(n+m)/2];
else return 0.5*(A[(n+m)/2-1]+A[(n+m)/2]);
}
e=i;
i=(s+e)/2;
j=(m+n)/2-i-(odd?1:2);
}
if((j<=n-1 && (A[i]<B[(jn-1){
s=i;
}
else if(j+1>=n || (j+1<n && A[i]<=B[(j+1=n) return 0.5*(A[i]+A[i+1]);
else return 0.5*(A[i]+ (i+1<m?min(A[i+1],B[j+1]):B[j+1]));
}
}
else e=i;
}
if(e-s==1){
for(i=s; in-1 || j=B[j] && (j+1<=n-1? A[i]=n) return 0.5*(A[i]+A[i+1]);
else return 0.5*(A[i]+ (i+1<m?min(A[i+1],B[j+1]):B[j+1]));
}
}
}
}
return -100000.0;
}
coderguy said on April 4, 2012
Hi, first, Sophie, thank you for your algorithm, and thank you 1337c0d3r for creating this site, it is very helpful for interview prep. I do have one question about Sophie’s algorithm. I looked at the MIT site, and the algorithm makes good sense to me, except for this part:
(B, A, max(0, (nA+nB)/2-nA), min(nB, (nA+nB)/2), nB, nA); Why do we need these exact initial boundaries? thank you.
coderguy said on April 4, 2012
I think I got the answer:
a.length = l
b.length = m
l + m = n
unionMedian(A, B, max(0, n/2 – m), min(l, n/2)) //why? See below
So since we will be searching in A for i, we put an upper bound on i by either the length of A (we don’t want an out
of bounds error) or n/2 (because we know the median of the merged array cannot possibly be past A[n/2]).
As for the lower bound, we can reason as follows: we want either 0, or if B is smaller than n/2, we will need some elements in
A to make up the distance to n/2; the exact amount of elements we need from A is the difference between n/2 and m, and since those elements
cannot be the median of the merged array, we can discard them initially.
Anonymous said on April 15, 2012
I found an interesting article like this only .But was a bit easy to understand..
http://www.malloc.co/algorithms/finding-the-median-of-two-sorted-arrays/
Hui Shu said on July 14, 2012
Hello guys looking at this interesting problem.
Here is the my java code that passes all tests in online judge.
It does not have specific code for corner cases, but I suppose those cases are handled automatically.
I firstly search the median in A using a binary search, then if I can’t find it in A, I look for the median in B.
Hui Shu said on July 14, 2012
One comment in the code is not right:
//if m+n is even, then the median is the average of (m+n)/2 and (m+n)/2 + 1
that should have been:
//if m+n is even, then the median is the average of (m+n)/2 and (m+n)/2 – 1
Hai said on July 16, 2012
Hui Shu, I like your solution! Straightforward and do not need to deal with corner cases which make code messy. This method should be able to used in finding Kth smallest number issue. The code given now is faulty.
benben said on December 18, 2012
recursive version: Hui Shu’s get method is very useful
public double findMedianSortedArrays(int[] A, int[] B) {
// Start typing your Java solution below
// DO NOT write main() function
return medianSearch(A, B, 0, A.length-1);
}
public double medianSearch(int[] A, int[] B, int left, int right ){
if(left > right){
return medianSearch(B, A, 0, B.length-1);
}
int i = (left+right)/2;
int j = (A.length+B.length)/2-i;
if(get(A, i) get(B, j)){
return medianSearch(A,B, left, i-1);
}
if((A.length + B.length)%2 == 0) {
return (get(A,i)+Math.max(get(A,i-1), get(B, j-1)))/2.0;
}else{
return get(A,i);
}
}
public int get(int[] a, int i){
if(i = a.length){
return Integer.MAX_VALUE;
} else{
return a[i];
}
}
benben said on December 18, 2012
Repost the solution as first one doesn’t work well.
public double findMedianSortedArrays(int[] A, int[] B) {
// Start typing your Java solution below
// DO NOT write main() function
return medianSearch(A, B, 0, A.length-1);
}
public double medianSearch(int[] A, int[] B, int left, int right ){
if(left > right){
return medianSearch(B, A, 0, B.length-1);
}
int i = (left+right)/2;
int j = (A.length+B.length)/2-i;
if(get(A, i) get(B, j)){
return medianSearch(A,B, left, i-1);
}
if((A.length + B.length)%2 == 0) {
return (get(A,i)+Math.max(get(A,i-1), get(B, j-1)))/2.0;
}else{
return get(A,i);
}
}
public int get(int[] a, int i){
if(i = a.length){
return Integer.MAX_VALUE;
} else{
return a[i];
}
}
Prateek Caire said on September 15, 2012
Isnt complexity just log n rather than log(m+n). I have not checked your code but rather MIT handout
KFL said on October 8, 2012
Could you explain why the average case complexity should be O(log(m+n))?
The below is my analysis:
case 1: when n >> m:
basically it finish disposing the shorter array then the larger one. so:
2*log(m) + log(n-m) ~ log(n)
case 2: when n and m are similar (let’s say m < n):
2*log(m) + log(n-m) ~ 2*log(m)
coding said on January 10, 2013
It seems that case 2(a) — 2(d) can be combined to one case:
int mid1 = (m – 1) / 2, mid2 = (n – 1) / 2;
k = min(mid1, mid2).
The following code passes on the cases:
class Solution {
public:
double minTwoAverage(int a[])
{
for(int i = 1; i <= 2; i++)
for(int j = 0; j < 4 – i; j++)
if(a[j] < a[j + 1])
swap(a[j], a[j + 1]);
return (double(a[2] + a[3])) / 2;
}
double baseCase(int A[], int m)
{
if((m & 1) == 0)
return (double(A[(m - 1) / 2] + A[(m - 1) / 2 + 1])) / 2;
return (double(A[(m - 1) / 2]));
}
double baseCaseOne(int A[], int m, int val)
{
int mid = (m – 1) / 2;
if((m & 1) == 0)
{
if(A[mid] <= val)
return min(A[mid + 1], val);
return A[mid];
}
if(m == 1)
return (double(A[0] + val)) / 2;
if(A[mid] <= val)
return (double(A[mid] + min(A[mid + 1], val))) / 2;
return (double(A[mid] + max(A[mid - 1], val))) / 2;
}
double baseCaseTwo(int A[], int m, int B[], int n)
{
int mid1 = (m – 1) / 2, mid2 = (n – 1) / 2;
if(m == 2)
return (double(max(A[mid1], B[mid2]) + min(A[mid1 + 1], B[mid2 + 1]))) / 2;
if((m & 1) == 0)
{
if(A[mid1] <= B[mid2])
{
int a[4] = {A[mid1 + 1], A[mid1 + 2], B[mid2], B[mid2 + 1]};
return minTwoAverage(a);
}
if(A[mid1] <= B[mid2 + 1])
return (double(A[mid1] + min(A[mid1 + 1], B[mid2 + 1]))) / 2;
return (double(A[mid1] + max(A[mid1 - 1], B[mid2 + 1]))) / 2;
}
if(A[mid1] B[mid2 + 1])
return max(A[mid1 - 1], B[mid2 + 1]);
return A[mid1];
}
double findMedianSortedArrays(int A[], int m, int B[], int n) {
if(!m)
return baseCase(B, n);
if(!n)
return baseCase(A, m);
if(m == 1)
return baseCaseOne(B, n, A[0]);
if(n == 1)
return baseCaseOne(A, m, B[0]);
if(m == 2)
return baseCaseTwo(B, n, A, m);
if(n == 2)
return baseCaseTwo(A, m, B, n);
int mid1 = (m – 1) / 2, mid2 = (n – 1) / 2;
int k = min(mid1, mid2);
if(A[mid1] >= B[mid2])
return findMedianSortedArrays(A, m – k, B + k, n – k);
return findMedianSortedArrays(A + k, m – k, B, n – k);
}
};
Alpha said on January 13, 2013
I think this could be done using ur previous post
http://leetcode.com/2011/01/find-k-th-smallest-element-in-union-of.html
And you don need to call twice or handling many cases
in case of m+n is even
we need to return average of k and k+1 element (where k will be (m+n)/2, definition of median)
if (Bj_1 < Ai && Ai Bj ? Bj : Ai.next; (probably one more condition to check Ai.next exist or not )
return (kth + kthnext)/2
}
let me know if something more needed.
Please note that we are not calling the function twice so it will not be inefficient .
Alpha said on January 13, 2013
ignore the previous it was typo
if (Bj_1 < Ai && Ai Bj ? Bj : Ai+1 (probably one more condition to check Ai+1 exist or not )
return (kth + kthnext)/2
Hardy said on January 14, 2013
Can you explain how the complexity of sophie’s solutions in log(n+m). I think its
logm + logn.
zyfo2 said on January 17, 2013
actually you can easily adapt your kth algorithm here. if m + n is even, you can easily get the next value after comparing the value of i+1 and j or j+1 and i.
inheritance said on January 23, 2013
Hey one quick question, why are you adding k to your arrays A and B? How did it compile?
return findMedianSortedArrays(A+k, m-k, B, n-k);
return findMedianSortedArrays(A, m-k, B+k, n-k);
briankwong said on January 25, 2013
The following code passes the online test:
H said on February 25, 2013
Seems Sophie’s method failed this test cases:
int[] A = {1, 2, 3, 4, 5};
int[] B = {6, 7, 8, 9, 10, 11, 12, 13, 14};
Juanissa said on March 1, 2013
In order to use the k-th element approach, you just need to apply once for k=(m+n)/2. when m+n is even, the k+1 -th can be found in constant time.
Say k-th element is j in array A. the k+1 -th element can only be j+1 in array A or k-j-1 -th element in array B, whichever is smaller. We don’t need to apply the same k-th element approach again. And this solves the base cases and corner cases problem.
Baigui Sun said on March 11, 2013
Hey guys, is there any AC code of binary search version. I’ve tried
solve this problem in binary search way, but it seems a lot of boundary case.
Lots of failed cases is caused by can not find valid A[i],B[j] and B[j+1] let B[j]<=A[i]<=B[j+1].
I am very confused now, can some one help me, thanks a lot.
Besides, the solution of divide and conquer version works just well.
Rex Niu said on March 15, 2013
The code below is based on the same idea:
compare the medians of two sorted array each step. After the comparison we can get ride of half length of the short array (such as B) in both A and B, so we can recursively call the same method. In log(B.length) steps, it will reach the edge case.
The code in Java below isn’t that clean, but passed both small and large tests and should show the idea.
Rex Niu said on March 15, 2013
The code below is based on the same idea:
compare the medians of two sorted array each step. After the comparison we can get ride of half length of the short array (such as B) in both A and B, so we can recursively call the same method. In log(B.length) steps, it will reach the edge case.
The code in Java below isn’t that clean, but passed both small and large tests and should show the idea.
Rex Niu said on March 15, 2013
Yongfu Lou said on March 21, 2013
The problem is:
while the number of elements being disposed from each array must be the same, can you still ensure the time complexity being log(m+n)?
shivi said on May 8, 2013
while(lo=n-1) {ans=1;lo=0;hi=n-1;mid=(lo+hi)/2;}
j=n-mid-1;
if(!ans)
{
if(ar1[mid]>=ar2[j] && ar1[mid]<ar2[j+1])
{
cout<ar2[j] && ar1[mid]>ar2[j+1])
hi=mid-1;
else if(ar1[mid]<ar2[j] && ar1[mid]=ar2[j] && ar2[mid]<ar2[j+1])
{
cout<ar2[j] && ar2[mid]>ar2[j+1])
hi=mid-1;
else if(ar2[mid]<ar2[j] && ar2[mid]<ar2[j+1])
lo=mid+1;
}
}
frank said on May 17, 2013
This is just a special case of case 2 in the “Find k-th smallest element”. k = [m+n]/2 (n<=k<=m).
frank said on May 17, 2013
The time complexity should be O(log(min(m,n))) instead of O(log(m+n)).
Jing Conan Wang said on May 29, 2013
The following code has passed the online judge. The idea is that for the case when m+n is even,
simply remove the largest element to m+n is odd. The advantage of m+n is odd is the the median will belongs to one of the sorted array. As a result, there is fewer corner case.
However, there is still one corner case when A has only one element. But it is pretty simple.
Jing Conan Wang said on May 29, 2013
The code above is not complete
Jing Conan Wang said on May 29, 2013
Sorry I have some issues with the html editor in leetcode.
here is plain text. Anyone know how to delete a reply?
class Solution {
public:
double findMedianSortedArrays(int A[], int m, int B[], int n) {
// Start typing your C/C++ solution below
// DO NOT write int main() function
if (m == 0) {
if (n%2 == 0) {
return (B[n/2] + B[(n/2)-1])/2.0;
} else {
return B[n/2];
}
} else if (n == 0) {
return findMedianSortedArrays(B, n, A, m);
}
if ( (m + n) % 2 == 0) { // for when there are even number of elements.
// Remove the largest element from the array, calculate the median
double value;
if (A[m-1] > B[n-1]) {
value = findMedianSortedArrays(A, 0, m-1, B, 0, n);
} else {
value = findMedianSortedArrays(A, 0, m, B, 0, n-1);
}
// count total number of elements that ((m + n) / 2)) {
return value;
} else {
double minv = INT_MAX;
if ((auit – A < m) && (*auit <= minv)) minv = *auit;
if ((buit – B < n) && (*buit <= minv)) minv = *buit;
return value / 2.0 + minv / 2.0;
}
}
// for odd number of elements, median is one element in the sorted array.
return findMedianSortedArrays(A, 0, m, B, 0, n);
}
double get(int B[], int mid, int rIdx, int rEle) {
if (mid B[Bmid]) || (Be – Bs <= 1)) {
return findMedianSortedArrays(B, Bs, Be, A, As, Ae);
}
int shift = min(Amid – As, Be – Bmid);
return findMedianSortedArrays(A, As + shift, Ae, B, Bs, Be – shift);
}
};
trytry said on June 14, 2013
why the median of {1,2,3,4} is 2.5, instead of 2/3 ? The median is not average– it is a number