Click here to download the pdf

## Pre Requisites

Binary Search — How it works and where can it be applied!

## Motivation Problem

We aim to solve this problem : Meteors

The question simply states :

There are *N* member states and *M* sectors. Each sector is owned by a member state. There are *Q* queries, each of which denote the amount of meteor shower in a [*L*, *R*] range of sectors on that day. The *i*^{th} member state wants to collect *reqd*[*i*] meteors over all its sectors. For every member state, what is the minimum number of days it would have to wait to collect atleast the required amount of meteors?

## Solution

The naive solution here is to do a binary search for each of the *N* member states. We can update in a range using segment trees with lazy propagation for each query. The time complexity of such a solution would be *O*(*N* * *logQ* * *Q* * *logM*). But this one will easily TLE.

Let's see if there's something we are overdoing. For every member state, the binary search applies all the queries until a point multiple times! For example, the first value of *mid* in the binary search is same for all member states, but we are unnecessarily applying this update everytime, instead of somehow caching it.

Let's do all of these *N* binary searches in a slightly different fashion. Suppose, in every step we group member states by the range of their binary search. In the first step, all member states lie in range [1, *Q*]. In the second step, some lie in range [1, *Q* / 2] while some lie in range [*Q* / 2, *Q*] depending on whether the binary search predicate is satisfied. In the third step, the ranges would be [1, *Q* / 4], [*Q* / 4, *Q* / 2], [*Q* / 2, 3*Q* / 4], [3*Q* / 4, *Q*]. So after *logQ* steps, every range is a single point, denoting the answer for that member state. Also, for each step running the simulation of all *Q* queries once is sufficient since it can cater to all the member states. This is pretty effective as we can get our answer in *Q* * *logQ* simulations rather than *N* * *Q* * *logQ* simulations. Since each simulation is effectively *O*(*logM*), we can now solve this problem in *O*(*Q* * *logQ* * *logM*).

"A cool way to visualize this is to think of a binary search tree. Suppose we are doing standard binary search, and we reject the right interval — this can be thought of as moving left in the tree. Similarly, if we reject the left interval, we are moving right in the tree.

So what Parallel Binary Search does is move one step down in *N* binary search trees simultaneously in one "sweep", taking *O*(*N* * *X*) time, where *X* is dependent on the problem and the data structures used in it. Since the height of each tree is *LogN*, the complexity is *O*(*N* * *X* * *logN*)." — rekt_n00b

## Implementation

We would need the following data structures in our implementation :

- linked list for every member state, denoting the sectors he owns.
- arrays
*L*and*R*denoting range of binary search for each member state. - range update and query structure for
*Q*queries. - linked list
*check*for each*mid*value of current ranges of binary search. For every*mid*value, store the member states that need to be checked.

The implementation is actually pretty straight forward once you get the idea. For every step in the simulation, we do the following :

- Clear range tree, and update the linked lists for
*mid*values. - Run every query sequentially and check if the linked list for this query is empty or not. If not, check for the member states in the linked list and update their binary search interval accordingly.

## Pseudo Code

```
for all logQ steps:
clear range tree and linked list check
for all member states i:
if L[i] != R[i]:
mid = (L[i] + R[i]) / 2
insert i in check[mid]
for all queries q:
apply(q)
for all member states m in check[q]:
if m has requirements fulfilled:
R[m] = q
else:
L[m] = q + 1
```

In this code, the *apply*() function applies the current update, i.e. , it executes the range update on segment tree. Also to check if the requirements are fulfilled, one needs to traverse over all the sectors owner by that member state and find out the sum. In case you still have doubts, go over to the next section and see my code for this problem.

## Problems to try

## Conclusion

I heard about this topic pretty recently, but was unable to find any good tutorial. I picked up this trick from some comments on codeforces blog on Meteors problem. Alternate solutions to the mentioned problems are also welcome, almost every question on parallel binary search can also be solved with some persistent data structure, unless there is a memory constraint. Feel free to point out errors, or make this tutorial better in any way!

**Happy Coding!**

I've heard about Meteors so many times. Thanks to you, today I've finally decided to solve it.

My implementation is slightly different. Maybe it will be easier for some people, so I'll describe it.

I wrote a recursive function

`rec(int low, int high, vector<int> owners)`

and I run it once as`rec(1, q, {1,2,...,n})`

. Vectorownerscontains people for which answer is in interval [low,high].Let's denote

mid= (low+high) / 2. I want to know which owners already won after the firstmidqueries. Then, I run`rec(low, mid, owners_who_won)`

and`rec(mid, high, those_who_didnt)`

.To check "which owners already won" I need standard range tree. In my code below you can find a variable

act_treedenoting the number of first queries summed in the tree. Let's say my function is run with argumentslow= 70 andhigh= 90. My global tree should now contain the sum of firstmid= 80 queries (so there should beact_tree= 80). Later I want to runrec(70, 80) andrec(80, 90). I need to do the following:act_tree= 75). I can runrec(70, 80).rec(80, 90).The time complexity is something like . Naive implementation will result in the memory complexity . To make it

O(N+Q) you need to pass vectorownersby reference and clear it after checking which owners already won. Check details in my code below.codeCool implementation! Btw you managed to pass time limit using recursion and vectors? I found time limit to be pretty strict. Numerous optimizations were required even with BIT to get an AC.

Thanks. Cool blog ;)

Yes. I didn't use BIT and I got AC even without passing vector by reference (so, with memory . I think that recursion isn't a problem here because we get operations in a tree. Without that, recursion is so it can be slower and it shouldn't affect the overall running time. Apparently, my solution may use less tree operations (e.g. two times less) than yours and it's why you had to use BIT. Or maybe for some other special test my solution would be slower, it's hard to say.

Hi, I have a question. Why do you set ans[who]=high when "who" can get his desired value in mid point.

Update: clear.

`owners.clear(); // thanks to this line the memory is O(n), not O(n log(n))`

I think without calling

`owners.shrink_to_fit()`

the memory is stillO(nlog(n)), right?Yes, you are right. Or we should clear it by swapping with an empty vector

`vector<int>().swap(owners);`

.I was discussing with satyaki3794 about his solution to Travel in Hackerland using parallel binary search. My first reaction when he told me about this technique was "Wow you don't have sequential code ? I have never submitted a parallel code online" :P

My bad. He then pointed me to the METEORS problem.

please update the link for the "Travel-in-hackerland" question :P .

https://www.hackerrank.com/contests/may-world-codesprint/challenges/travel-in-hackerland

Auto comment: topic has been updated by arrogantIdiot (previous revision, new revision, compare).SRM 675 Div.1 LimitedMemorySeries1(500pts) can be solved with this method.

This is my code accepted in the practice room. http://ideone.com/Dm2x4M

Div1 500! Wow, I never knew this would be so useful :P

My code is similar to yours, can you point out the error in this please? I am getting wrong answer. Code

In line 77 , you need to break when current sum exceeds requirement , else you will overshoot ll_max.

Great Blog!

A cool way to visualize this is to think of a binary search tree. Suppose we are doing standard binary search, and we reject the right interval — this can be thought of as moving left in the tree. Similarly, if we reject the left interval, we are moving right in the tree.

So what Parallel Binary Search does is move one step down in

Nbinary search trees simultaneously in one "sweep", takingO(N*X) time, whereXis dependent on the problem and the data structures used in it. Since the height of each tree isLogN, the complexity isO(N*X*logN)Just the kind of explanation I was looking for while writing this blog! You don't mind if I add this to the blog, right?

Not at all!

That's a cool trick :)

I tried to solve the problem "Make n00b land great again" but couldn't figure out how to efficiently update the queries. Can you please explain how to do it efficiently?

You can go through it's editorial. Updates can be handled by range tree.

Oh thanks! Didn't see that the editorial was there.

Thanks arrogantIdiot for a nice blog post. This helped me to learn a new technique :) ..Parallel BS is awesome..Anyway, at first while reading Problem description for METEORS , first thing that come to my mind was Persistence Segment-tree as this is a straightforward and implemented and got AC with 165M memory & 2.82 time, after that implemented ParallelBS (first time, of course :) ) and got AC with 22M & 4.72 time (huge time but less memory ;) ).....Then, after looking RANK for that problem, I wondered how some people managed to solve this problem in less time like 0.34,0.58,0.93...Is there any better approach?

Can you share your approach for the Persistent Segment-tree solution?

I did it with persistent segment tree by performing point-update of +a[i] on l[i] and -a[i] on r[i]+1 (for l[i]<=r[i], similar updates for l[i]>r[i]). But it got TLE without fast-io , and AC with a time of 7.72 s with fast-io.

What we need here in segment-tree is

"Range update Point query". It seems to require lazy here, but we may skip lazy by performing something like this:At the time of Range update we have to update the sum for all the segments which encounter while updating that range. And, now atthe time of Point query we just have to go down to the point onsegment-tree by adding up the sum of segments that encounter whilegoing down to this point...Hope it helps :)can you share your persistent segment tree solution?

Lazy segment trees are actually not necessary for this problem. You can also do the line sweep technique to count how many meteors have been added to sector

ifor everyi(in which events are endpoints of queries and sectors) to knock down a log off the complexity, making it .codeI couldn't understand your solution. Could you share how your code runs on the sample?

I guess it's a lot closer to Errichto's implementation than the original one:

`rec(minq, maxq, events)`

performs all queries between minq and`(minq+maxq)/2`

and gets the number of meteors for every state.The way it performs the queries is by doing a line sweep:

`events`

is a sorted list of the endpoints of all queries in the range`[minq, maxq]`

(KTYPE events), and for each starting endpoint I adda_{i}to the current section's meteor count, for each ending endpoint I subtracta_{i}. This way for every sector I know how many meteors it has for those queries.`events`

also contains all states I'm searching in this query range (NTYPE events), and a list of sectors that those states own (MTYPE events).For the first step of the sample I would first call:

`rec(minq=0, maxq=3, { states:{0,1,2,3}, sectors:{0,1,2,3,4}, queries:{0,1,2} })`

`(minq+maxq)/2`

is 1, so I'm running the first two queries and ignoring the rest. The last state succeeds, and the two first states fail, so the recursion splits into:`rec(minq=0, maxq=1, { states: {2}, sectors: {1, 4}, queries:{0,1})`

`rec(minq=2, maxq=3, { states: {0,1}, sectors: {0, 2, 3}, queries: {2} })`

.Because we will never consider queries 0 and 1 for states 0 and 1 anymore, I subtract how many meteors those states had from the required number of meteors before recursing.

The second round of splits will produce:

`rec(minq=0, maxq=0, { states: {2}, sectors: {1, 4}, queries:{0})`

`rec(minq=1, maxq=1, { states: {}, sectors: {}, queries:{1})`

`rec(minq=2, maxq=2, { states: {0}, sectors: {0, 3}, queries: {2} })`

.`rec(minq=3, maxq=3, { states: {1}, sectors: {2}, queries: {} })`

.And we can see that minq indicates the answer for every state in the corresponding list.

i just solved the METEORS question.I was getting wrong answer because of the fact that i was not breaking out of loop when summing the value of all the sectors for a member.I was doing all calculations in long long in c++ . So why does this happen isn't the maximum number is 3*10^14 which long long can handle, on breaking out of loop as soon as my required is satisfied it got AC .Can anyone answer why?

What if all queries is to increase by 10

^{9}in the entire range?A separate but related problem: can we run Q binary searches (e.g. Q lower_bound operations) over a sorted array of N elements faster than in

O(Q*logN)? (It is assumed that all Q queries are known beforehand).You can first sort the

Qqueries in and then iterate over two arrays simultaneously inO(N+Q). So, the complexity is . But... is this complexity better? I don't see any example (like ).A curious case. Assuming the following calculation is correct, the complexity is not better https://www.wolframalpha.com/input/?i=solve+Q%2Alog%28N%29+%3C+Q%2Alog%28Q%29%2BN

It looks like there is no way to improve the case Q=N. Otherwise we could sort faster than N*logN. Or at least we could sort several arrays faster. We could sort one of them in N*logN and others faster then N*logN (by doing parallel binary searches over the first already sorted array).

I tried this logic to prove that it's impossible to solve your problem faster — and I didn't succeed. So, I don't think you're right here. UpperBound isn't enough to sort the other array. For example, consider the first array to be {100,200,300,400,500} and a new array {17,29,18,50,10,30}. Still, maybe some similar proof will be correct.

Let assume the time complexity of that problem with N = Q is T(N). Furthermore, I do some technical assumptions.

T(N) ≤T(2N+a) / 2 for allN,a≥ 0 and .I show that we can sort an array of N distinct elements in

O(T(N)) expected time under the comparison model with randomization.N/ 2⌋ elements. Then, recursively sort the chosen elements. Let call the sorted array beB.Band the ⌈N/ 2⌉ unchosen elements so we can locate each unchosen elements to between adjacent elements ofB.Because we chose randomly, and we assumed that the each element is distinct, each group (the elements with the same query result) is a constant size. We can thus sort each group in

O(1) expected time.Let the runtime of the algorithm above be

S(N). By the assumption onT(·), we can easily see that .Because we cannot sort an array in time under the comparison model even with randomization, we conclude that ■.

Open problem: can this reduction be derandomized?

I don't think it's obvious that "each group is a constant size". It's logical/obvious that the sum of (where

sis size of group) isO(n) in total but you didn't prove it. Still, your proof is nice, thanks.By the way, what is the expected size of the biggest of groups? It sounds like something very hard to analyze. Maybe some tedious calculations would be enough here.

I finally realized what answer to this question I wanted. My initial thought was that when doing Q independent binary searches, we perform redundant work. And indeed there is a way to reduce it, though the worst case complexity will remain

O(Q*logN).We can visualize binary search as a binary search tree traversal. We will do a standard tree traversal, but we'll process all queries together. To achieve this, we sort our queries and start from the root of the tree, which corresponds to the median of the array. By doing binary search on queries we find which queries lie to the left and which lie to the right of the root node. We continue traversing tree recursively. As a result, we reduce the number of comparisons due to partially coinciding search paths.

Hey, can somebody plz help me in debugging my code for meteors. Code

P.S. — I also wrote an input generator and compared my solution to one of the accepted solutions, but could not find any mismatch.

Your bug is at line 104. This sum may be very big, around 10^20.

Thanks a lot. Accepted :)

Selective Additions — A problem from the last Hourrank on Hackerrank. Can be solved using Parallel BS.

cool application.

I think we can do all these questions with

Persistent Segment TreewithLazy Propagation.Here is my code of Meteors — https://ideone.com/PgJg1E

Complexity —

O(M*logQ*logM)One downside of this is that it takes a lot of memory.

I don't think it is possible to do Lazy Propagation in Persistent Segment Tree. Memory Complexity becomes

O(QN), that isn't the purpose of Persistent Segment Tree.You may get AC in SPOJ, because in SPOJ you have

1.5GBMemory limit, and as I see, your code got AC with1.3GBmemory usages.But in the main contest of POI, Memory limit was

64MB;)Try submitting here — SZKOpul — Meteors

It is possible to do Lazy Propagation in Persistent Segment Tree. The Memory Complexity

O(N + Q*logM). You can see in my code that while querying/updating I'm making at maxO(logM)nodes only.Yes, it won't give AC if the memory limit is tight. I had already mentioned the downside.

P.S. I was easily able to reduce the memory to

0.44GB. Just reduced the array size.Code — https://ideone.com/x7dwqq

I don't think that memory complexity will still be

O(N + Q log M), because, when you propagate a lazy value, in the end you may end up creating a tree for that range. So memory complexity is something greater thanO(Q log M)and Worst caseO(QN).Please take a look at my code carefully. Try to run it for some cases. You'll know then :)

you don't create a new tree for the range, you just update the node, it's still that memory complexity. If you used QN memory each time it wouldn't make any sense, because that would assume regular lazy propagation would be QN, since you access/create same number of nodes (log n).

Please, update the link for "travel-in hackerland" question,

it's new link is this

How is the brute force complexity the one given above? For each station, we do binary search on queries and we keep 2 pointers, one for binary search and other for the queries pointer which points upto where the queries are applied.I suppose this pointer moves at max the whole queries array twice. So must not it be Q+LogQ instead of Q.LogQ. And then, we apply range query range update using BIT, so what I suppose is the complexity must not contain the factor of Q. Please correct me if I am wrong.

Here's a problem related to the concept.

https://agc002.contest.atcoder.jp/tasks/agc002_d