using merging segment tree to solve problems about sorted list

#	User	Rating
1	ecnerwala	3649
2	Benq	3581
3	orzdevinwang	3570
4	Geothermal	3569
4	cnnfls_csy	3569
6	tourist	3565
7	maroonrk	3531
8	Radewoosh	3521
9	Um_nik	3482
10	jiangly	3468

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	162
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	151
8	SecondThread	147
9	orz	146
10	pajenegod	145

Two months ago, I came across a problem.

Initially there are n elements, they are in n tiles.

There are 3 kinds of queries:

merge two tiles into one tile.
split one tile into two tiles. (Formally for a tile of size k, split it into two tiles of size k1 and k2, k=k1+k2, the first tile contains the smallest k1 elements and the second tile contains the rest)
find the k-th smallest element in one tile.

Recently I found this technique http://blog.csdn.net/zawedx/article/details/51818475 (in Chinese) which can be used to solve this problem. This blog is my own explanation :p

First, let's suppose the values in the sorted lists are integers between 1~n. If not, you may just sort and map them.

Let's build a segment tree for every sorted list, segment trees are built based on values (1~n). In every node of a segment tree stores how many numbers are in this range, let's call this the value of a node.

It seems that it requires O(nlogn) space to store every segment tree, but we can simply ignore the nodes that value=0, and really allocate these nodes only when their value become >0.

So for a sorted list with only one element, we simply build a chain of this value, so only O(logn) memory is needed.

int s[SZ]/*value of a node*/,ch[SZ][2]/*a node's two children*/;
//make a seg with only node p, return in the first argument
//call with sth. like build(root,1,n,value);
void build(int& x,int l,int r,int p)
{
    x=/*a new node*/; s[x]=1;
    if(l==r) return;
    int m=(l+r)>>1;
    if(p<=m) build(ch[x][0],l,m,p);
    else build(ch[x][1],m+1,r,p);
}

When we split a segment tree (sorted list), simply split two children recursively:

//make a new node t2, split t1 to t1 and t2 so that s[t1]=k
void split(int t1,int& t2,int k)
{
    t2=/*a new node*/;
    int ls=s[ch[t1][0]]; //size of t1's left child
    if(k>ls) split(ch[t1][1],ch[t2][1],k-ls); //split the right child of t1
    else swap(ch[t1][1],ch[t2][1]); //all right child belong to t2
    if(k<ls) split(ch[t1][0],ch[t2][0],k); //split the left child of t1
    s[t2]=s[t1]-k; s[t1]=k;
}

When we merge two sorted lists, merge them forcely:

//merge trees t1&t2, return merged segment tree (t1 in fact)
int merge(int t1,int t2)
{
    if(t1&&t2);else return t1^t2; //nothing to merge
    ch[t1][0]=merge(ch[t1][0],ch[t2][0]);
    ch[t1][1]=merge(ch[t1][1],ch[t2][1]);
    s[t1]+=s[t2]; /*erase t2, it's useless now*/ return t1;
}

Also, just query k-th simply.

//query k-th of segment tree x[l,r]
int ask(int x,int l,int r,int k)
{
    if(l==r) return l;
    int ls=s[ch[x][0]]; //how many nodes in left child
    int m=(l+r)>>1;
    if(k>ls) return ask(ch[x][1],m+1,r,k-ls);
    return ask(ch[x][0],l,m,k);
}

It looks very simple, isn't it? But its total complexity is in fact O(nlogn).

Proof

Happy new year~

function union(t1, t2): if t1 = nil: return t2 if t2 = nil: return t1 if priority(t1) < priority(t2): swap t1 and t2 t<, t> ← split t2 on key(t1) return new node(key(t1), union(left(t1), t<), union(right(t1), t>))

Comments (22)

Write comment?

choutii

6 years ago, # |

+10

can we use this skill to solve problem like this

sort range [l, r] increase or decrease
output the final sequence

→ Reply

ko_osaga

6 years ago, # ^ |

It seems kinda unrelated, so I guess it won't be easy

TLE

← Rev. 2 →

+50

Yes, you can use it to solve this problem in O(nlogn).

(notice: I use 'substring' to represent continuous subsequence of the sequence)

The main idea is to represent the current sequence with a set of sorted substrings (using std::set etc). For each sorted substring, store this kind of segment tree and the order (increasing or decreasing). Initially, you can represent the original sequence with just n sorted substrings of length 1.

When you need to sort a range, find the relevant range in set. Notice that the two points of the range might not be the border of some sorted substrings in the set, in this case you can split the substrings contain these points into two halves so that the points of the query become borders of two sorted substrings.

After sorting, these sorted substrings should merge into one sorted substring. Just merge them forcely and maintain in the set.

The complexity is obviously O(nlogn). Also it can support online queries (e.g. what is the k-th number of the sequence currently.)

Dalgerok

-15

Code please?

+21

http://paste.ubuntu.com/25851896/

Thanks for the nice tip! I think this problem may be related?

Yes, I think it can be used to solve this problem also.

chinmay0906

+12

I was trying to solve this problem using the above method but getting MLE :( n = 1e6 in the problem and memory limit is 128 MB.

+23

Oh the memory limit does seem too tight. You can try merging splay trees (from small to large), by finger theorem it's also $\text{[math]}$ .

MrDindows

If you will solve this task via dfs, which visit the largest subtree first, the memory complexity will be O(N) and such a solution gets OK actually. If I'm not wrong, we can prove that at one point in time we will not need more than 4N segment tree nodes, and that fits in ML.

Wow!! Thanks a lot MrDindows. Got AC with the change. I get the intuition that solving for the larger sub-tree first will require lesser nodes. But how is it reducing the memory complexity from O(nlogn) to O(n)? Can you give a short proof or a hint towards it?

← Rev. 3 →

With such dfs order at any moment we can have not more than $\text{[math]}$ not-merged trees, because the size of each next tree should be at least twice smaller than previous. Let's assume n = 2²⁰, then in worst case we will have the trees of sizes: 2⁰, 2¹, 2², ..., 2¹⁹. (Worst case is when we have maximum number of trees, because the more trees — the more nodes). The maximum node count in the tree with 2^k numbers is (2^k + 1 - 1) + 2^k·(20 - k), and the sum of these values is equal to 4·2²⁰ - 44.

So the worst test case is the almost full binary tree, where leaf_i = reverseBits(i).

Got it, thanks.

Batman

I don't understand merge function at all.

Aren't t1 and t2 just the id_numbers of some nodes? Then what does t1^t2 even means? If it's not a c++ code you wrote then how to do this XOR operation in O(1)?

It's implicit segment tree, you create the node when you need it. That's why t1 or t2 can be zero, so t1^t2 will return max(t1, t2);

You mean that line can just be written like this? :

if(t1==0 || t2==0)return max(t1,t2);

Then aren't we going into all nodes of these two segment trees? Isn't the complexity of calling merge function O(T1.size()+T2.size())?

If I'm wrong about that line so what does t1&&t2 means?

Yes, if(t1==0 || t2==0)return max(t1,t2);

No, it's like O(min(T1.size(), T2.size())). We are going only into common vertices, that should be merged.

t1&&t2 is equal to (t1 != 0) && (t2 != 0)

bicsi

Correct me if I’m wrong, but I believe using the same argument you should achieve the same amortized complexity using treaps, right? The main point here is the $\text{[math]}$ memory overhead of seg trees, which might hinder performance in some cases.

← Rev. 4 →

Well the structure of treap is dynamic, so I don't actually know how to merge them..

Source: Wikipedia

I will test and compare both data structures today and I’ll get back with some results.

Interestingly enough, I found treaps to be a tad bit slower than segtrees on random input. Nonetheless, they don't seem to break complexity. Feel free to look at the code and experiment though, if you notice any mistakes on the treap part or if you think treaps would fil on certain (non-random) test data.

Results: https://pastebin.com/4ivKeAZR

Source code: https://pastebin.com/ua02DbPY

lucasturci

4 years ago, # |

Does this merging technique work only for segtrees storing set structures? Like, when sum(x, x) is at most 1?

TLE's blog