__A median problem - Solution__

Finally, I post the solution to the problem from the Microsoft interview, that I, long ago, promissed to do. Sorry for the delay. I have received on my email solutions from SCQ =) and Vasya V. ( solution6 ), from Harta a solution similar to solution4 and solution7 due to Nyaz Nigmatullin. Thanks all.

**Note: Read solution5 for the O ( lg n ) algorithm.**__Statement__

You are given two sorted sequences of 32-bit integer numbers. Find the median of the sequence obtained by concatening the two sequences.(Update: By sequence I don't mean the STL one; a simple C array would do - anything with constant random acces).

__Solution1.__

Simple and obvious: concatenate the two sequences( O ( n ) ) and sort them ( O ( n^{2}), O ( n lg n ) ). Then output the element in the middle.

__Solution2.__

Also simple and obvious: Merge ( O( n ) ) the two sequences and output the element in the middle. Merging can be done in-place and can be stable: see here.

__Solution3.__

As we only need the median element, we can adapt the simple clasical merging ( still O ( n ) time, but now O ( 1 ) additional memory ) to our problem. Instead of merging the whole amount of elements, we merge only half ( to get to our median element ) and instead of keeping the whole amount of merged elements in a separate array, we can keep only the last element in this sequence. Some code for this:

median(int a[], int b[], int n, int m)

int i, j, k;

i = j = 0;

while ( i < n && j < m && i + j < ( n + m ) / 2 )

if ( a [ i ] < b [ j ] )

k = a [ i ]

i ++

else

k = b [ j ]

j ++

if ( i < n && i + j < ( n + m ) / 2 )

i = i + ( n + m ) / 2 - ( i + j )

k = a [ i ]

if ( j < n && i + j < ( n + m ) / 2 )

j = j + ( n + m ) / 2 - ( i + j )

k = a [ j ]

return k

Solution4.

Solution4.

Suppose the 2 sequences are a and b, 0 indexed and have sizes m and n. Also suppose that the median element is in the first sequence. Then we can binary search it.

Given an element in the first sequence we can find out, also using binary search, which order of statistics it is.

Suppose we choose the k-th element in the first sequence( a [ k ] ). Then we know there are k elements in the first sequence that are <= a[k]. We can find, in the second sequence, using binary search, the position kp for which b [ kp ] <= a [ k ] < b [ kp + 1 ] ( if a [ k ] is smaller/greater than all elements in b the we take kp to be -1/n ), that is the position after which a [ k ] would be inserted in sequence b, such that the sequence remains sorted. Then we know there are kp + 1 elements smaller in the sequence b. So our element a [ k ] is larger than

k + kp + 1 elements in the union of the two sequences a and b.

Now, given two elements a [ x ] and a [ y ] with x < y, we know that a [ x ] 's order of statistics is smaller than

a [ y ] 's( if x + kx is the order of statistics of a [ x ] then a [ y ] 's order of statistics is at least y + kx which is greater than x + kx ).

So we can binary search in the first sequence the element with order of statistics ( m + n ) / 2. If no such element is found then we search it in the second sequence.

Final complexity O ( lg^{2}( n ) ).

__Solution5.__We notice, in solution4 that when we check if an element is the median, we do not need to find its order of statistics - we only need to check it its order of statistics is equal or not with n / 2.

Suppose that our element a [ k ] is the median => a [ k ] >= than ( ( m + n ) / 2 ) - 1 elements in the union of the sequnces a and b -- k elements in a and ( ( m + n ) / 2 ) - 1 - k in b.

So if b [ ( ( m + n ) / 2 ) - 1 - k ] <= a [ k ] < b [ ( ( m + n ) / 2 ) - 1 - k + 1 ] then our element

a [ k ] is the median. If the inequality doesn't hold our element is not the median.

As we know we can binary search in the first sequence the element with order of statistics

( m + n ) / 2 and if not found in the first sequence, we can search it in the second sequence,

the final complexity is O ( log ( n ) ).

__Solution6.__

Another solution would take advantage of the fact that the numbers are 32 bit integers. We can binary search our median in the interval [ - 2^{31}, 2^{31}- 1 ]. For each number x in there we can find how many numbers are smaller than x in the two sequences by applying the binary search algorithm on each of them. We also need to be careful, because x must be in one of the two sequences. Final complexity O (lg n).This algorithm has a hidden constant of 32 which on the general case when integers that are restricted to an interval of size S the algorithm has a complexity of O ( lg ( S ) * lg ( n ) ).Note:

__Solution7.__

The following solution I received via email and although I can see it is some form of binary search, I hadn't have the time to understand what it does. So here is the source code, as sent, without explanations.

Some java code for this(due to Niyaz Nigmatullin):

int solve2(int[] a, int[] b) {If you have any questions, requests, comments, corrections to make please feel free to post. Thanks and, again sorry for the delay.

if (a.length > b.length) {

int[] t = a;

a = b;

b = t;

}

int left1 = 0, right1 = a.length - 1;

int left2 = 0, right2 = b.length - 1;

int cur1 = 0, cur2 = (a.length + b.length - 1) / 2;

while (true) {

boolean ok = right1 == left1 + 1 && right2 == left2 + 1;

if (a[cur1] == b[cur2]) {

return a[cur1];

}

if (a[cur1] < b[cur2]) {

int d = Math.min((cur2 - left2 + 1) / 2,

(right1 - cur1 + 1) / 2);

if (d == 0) {

return cur2 == 0 ? a[cur1] : Math.max(b[cur2 - 1], a[cur1]);

}

left1 = cur1;

right2 = cur2;

cur1 += d;

cur2 -= d;

} else {

int d = Math.min((cur1 - left1 + 1) / 2,

(right2 - cur2 + 1) / 2);

if (d == 0) {

return cur1 == 0 ? b[cur2] : Math.max(a[cur1 - 1], b[cur2]);

}

right1 = cur1;

left2 = cur2;

cur1 -= d;

cur2 += d;

}

if (ok && left1 + 1 == right1 && left2 + 1 == right2) {

int[] c = { a[left1], a[right1], b[left2], b[right2] };

Arrays.sort(c);

return c[1];

}

}

}

Oh, and if you can explain solution7 please do.