A new way to solve range problems

#	User	Rating
1	tourist	3845
2	jiangly	3707
3	Benq	3630
4	orzdevinwang	3573
5	Geothermal	3569
5	cnnfls_csy	3569
7	jqdai0815	3532
8	ecnerwala	3501
9	gyh20	3447
10	Rebelz	3409

#	User	Contrib.
1	maomao90	171
2	adamant	163
3	awoo	161
4	nor	152
5	maroonrk	151
5	-is-this-fft-	151
7	TheScrasse	148
8	atcoder_official	146
9	Petr	145
10	pajenegod	144

I'm sorry that I'm a Chinese and most of the English name of algorithm I don't know. So, I use google translation to translate it.

Have you ever seen the following questions?

Calculate the median of a range, and you must finish online (means you need to calculate real question by using last answer). What would you do? If the problem is not online, we can easily use persistent segment tree to solve it. And now it is online, maybe you think you can solve it by using AVL.

Now, I will show you another way to solve this question. Divide the sequence into n blocks, each blocks' size is sqrt(n). For each block, we set a segment tree that contain the value of this block.

For a question of asking [l,r], we find the blocks that the whole block of it in the range. And we use segment tree merge to merge them. And for the number that is not contained by the complete block, we insert them into the segment tree.

And we get a segment tree that contain the value of the range. And we can use segment tree binary search to solve this problem.

There is a sample: We set a array of [1,1,4,2,1,3,1,4,3], and the picture below is the block number and segment trees of the array.

If we want to ask range [2,8], we will do this below:

And we can solve something on the new segment tree.

For the time limit. We set the numbers in the array is lower than m, there are n numbers in the array. Because of the size of the block is sqrt(n) and numbers of blocks are sqrt(n), so we mostly do segment tree merge for sqrt(n) times, and mostly insert the numbers into the segment tree for 2*sqrt(n) times. And then, the time of the segment tree merge is log2(m), insert the numbers are log2(m) too. So, this algorithm's time is O(n*m*sqrt(n)*log2(m)), We can solve it in 1 seconds when n is lower than 2e5 and m is small.

But there is always a question, memory. Because we can't modify the value of the segment tree in the blocks. So we need to merge the segment tree in the blocks to the new segment tree. If we create a new segment tree for every query, the memory will not ok. So we only use one segment tree, and after every query, we clear it.

A possible code:

int l,r;
l=read(),r=read();
for(int i=l;i<=min(belong[l]*block,r);i++)
{
	update(root[nowroot],1,len,Ar[i],1);
}
if(belong[l]!=belong[r])
{
	for(int i=(belong[r]-1)*block+1;i<=r;i++)
	{
		update(root[nowroot],1,len,Ar[i],1);
	}
}
for(int i=belong[l]+1;i<=belong[r]-1;i++)
{
	merge(root[nowroot],root[i],1,len);
}

int ans=query(root[newroot]);
printf("%d\n",ans);
clear(newroot);

In this code, block is sqrt(n), query is a function of answering the problems, update is the function of inserting numbers, merge is the function of segment tree merge.

Do you think it's useless? Yes, by now, all it can finish, we can use AVL to solve it, and it needs a longer time. But I don't know what other problems this algorithm can solve. At least, we only need to learn one algorithm, congratulation! And I think that many people who issue the problems of data structure, they will set limit of n<=1e5 and not for 1e6!

My level is limited, please give me more advice, thank you very much! :)

SOMTHING NEW

My classmate pzq_alex give me a useful remind, and I just discover this algorithm also can modify one position's value! If we want to change the value of a one position, we can modify the block's segment tree by deleting the position's original value, and adding the new value.

tfg mention that each query is nlogn, and I think about it. I discover that each query is actually sqrt(n)*m*logm. That is, when m is very small, like lower than 100 or 10, we can still use this algorithm. Because m is the max value of the array.

Comments (13)

Show archived | Write comment?

Luka_Modric

18 months ago, # |

-7

This method is very wonderful!

→ Reply

pzq_alex

← Rev. 3 →

Can't you solve this with a plain persistent segment tree? Let $$$s_r$$$ be the segment tree where index $$$i$$$ represents the number of $$$i$$$s in the range $$$[1,r]$$$. Precompute all segment trees $$$s_r$$$ (some of which share subtrees, of course) before answering queries. To answer a query, just compute $$$s_r-s_{l-1}$$$ (minus denoting indexwise minus) and use the method you mentioned to compute the median.

The only problem I see with this is space ($$$O(n\log n)$$$) whereas your method uses $$$O(\frac{n}{b}\log n)$$$ where $$$b$$$ is the block size. However the time of your algorithm is $$$O((b+\frac{n}{b})\log n)$$$ which achieve optimum $$$O(\sqrt{n}\log n)$$$ per query at $$$b=\Theta(\sqrt{n})$$$. If space is limited and time is not, then your algorithm is better.

BFSDFS123qwq

18 months ago, # ^ |

I think you are right, and I discover that this algorithm also can modify.

And I update the article, thank you very much! :)

gshsmuyaho

You can solve the offline version(All of the queries are given after the updates) of this problem using Merge sort trees. It takes space O(Qlog Q) and time O(Q(log Q)^2). And although it gets a bit(lot) more complicated when the queries become online, you can still solve this using merge-sort trees if the position of the queries and updates are fixed.(When I mean online I mean when queries and updates can be given in any order, but the location of the queries and updates are fixed.(The answer to the queries do not change the position of the queries and updates after it(the values of the updates can be changed))(Q means the number of queries and updates)

Yeah, you are right. But unfortunately it is online :(

tfg

+21

Looks very fishy. I think that you can't merge segment trees fast as you claim. This algorithm looks O(N*logN) per query.

Auto comment: topic has been updated by BFSDFS123qwq (previous revision, new revision, compare).

wsc2008qwq

I think this algorithm can't run at a complexity as you mentioned. Each time you'll create $$$r-l+1$$$ segment tree nodes if all the elements are distinct. If you do the same query $$$[1,n]$$$ for $$$q$$$ times, the complexity will be at least $$$O(nq\log n)$$$ (maybe times another $$$\sqrt n$$$).

If I made a mistake please correct me.

+10

Another thing is that you said it can solve the problem when $$$m$$$ is small. But why don't we just use $$$m$$$ prefix sums to calculate the occurrences of each element in $$$[l,r]$$$?

Maybe your algorithm is a good idea, but it needs improvement to make it more useful.

2147483648

Your rating graph is amazing!

atharvd

← Rev. 2 →

For further information about this, read the Competitive Programmer's Handbook.

Sum query over a range takes $$$O(\sqrt n)$$$ time.

Misuki

← Rev. 4 →

This problem can also be solved by wavelet tree in $$$O(lgC)$$$ with very clean implementation where $$$C$$$ is the range of elements in the array.

(Or $$$O(lgN)$$$ if you do coordinate compression first)

BFSDFS123qwq's blog