Problem with merge sort tree

#	User	Rating
1	ecnerwala	3648
2	Benq	3580
3	orzdevinwang	3570
4	cnnfls_csy	3569
5	Geothermal	3568
6	tourist	3565
7	maroonrk	3530
8	Radewoosh	3520
9	Um_nik	3481
10	jiangly	3467

#	User	Contrib.
1	maomao90	174
2	adamant	164
2	awoo	164
4	TheScrasse	160
5	nor	159
6	maroonrk	156
7	-is-this-fft-	150
7	SecondThread	150
9	orz	145
9	pajenegod	145

I read merge sort trees from here (https://discuss.codechef.com/questions/94448/merge-sort-tree-tutorial) understood the build function but I can't understand the query function.

Why would you go top to bottom? you'll meet the "if( x<=l && r<=y )" condition on the root node itself the program will never traverse down the tree. Also for an array of size 10 and a query with a range 2-6 the sorted array can have elements from 6 — 10 and 1 of the original array.
Also, we can't search on the merge sort tree as the elements have been shuffled in sorting (ie, the indices of original data and sorted array don't match), the only possible way on searching on such a tree should be bottom up ie, leaf to tree root because the leaf nodes are pushed in the order of the original data and subsequently we must climb to parent node whenever possible (because minimising number of binary searches is our motive here) climbing can only be done if both the daughter nodes are included in l to r as they are a part of search range and their order doesn't matter.

Comments (3)

Show archived | Write comment?

Jakube

6 years ago, # |

Do you know the normal Segment tree?

1) That's the whole point of a Segment tree / Merge sort tree. You don't want to go down the tree. Because at the bottom of the tree there are n nodes. And you want to answer queries in less than O(n) time. By using the condition x <= l && r <= y you are guaranteed that you only have to visit $\text{[math]}$ nodes, that still contain the same information as the O(n) nodes at the bottom.

2) The Merge sort tree is not designed for searching. If you want to use it for it, then I would suggest storing the value-index pairs in the vectors. This way you also only need to visit $\text{[math]}$ nodes.

→ Reply

Original_Batman

6 years ago, # ^ |

← Rev. 5 →

image link — https://photos.app.goo.gl/jRvGHTXd4kcfw2RQ7

Thanks for replying! Yes, I know segment trees but I think I was unable to clearly state my problem so I've attached above image. So what I'm asking is suppose we have a query with l = 2 and r = 5 (both including and array starts from zero) with k = 12. what the query function does is it searches the top array from index 2-5 which includes (5,7,8,9) note that 8,9 were not in our original array search space (original array is the values in the leaf nodes). What the program should do according to me is it should start from the bottom leaf nodes (2,5,10,23) and see if we can traverse upwards following just one rule: the parent node must strictly have the values from our search space. Following this rule we can move upwards to the array having (2,5) and (10,23) and that's where we do a binary search. Note that if we go to the parent node of (2,5) we will have (2,5,7,9) out of which 7 and 9 were not in our original search space. Same applies for the array (10,23).

So I think either my approach is wrong or I didn't understand the article completely. So what do you think where am I going wrong?

P.S.- I'm stuck on this for 3 days now and I've read that article 5 times still I don't get what's the problem. So thanks a lot for showing interest in my problem and helping me. :)

Yes, you have something wrong. You should only use the nodes that fall completely inside your search area.

If you start from the top, you basically should do the following. The top node contains the array complete array a[0..7] in sorted order. This range is bigger than your search area a[2..5], which means that you cannot answer your question here. Simply skip this node and go to the two child nodes.

The same way with the child nodes. The left one contains the array a[0..3], which isn't completely a subarray of your search array, so you go to the child nodes.

The right one of these two child nodes is a[2..3], which is a subarray of a[2..5], therefore you can count the number of elements < k in it (with binary search).

In the same way, if you go from the root node to the right child, and from there to the left child, you'll end up at the node with the information a[4..5], which is also a subarray of a[2..5]. And you can count the number of elements < k in it.

To summarize: You go through the tree, starting at the root node, to find nodes that partition your search interval. This only takes $\text{[math]}$ time, and the partition will only use $\text{[math]}$ nodes. Here the two nodes a[2..3] and a[4..5]. Using these nodes only you answer the query. Do a binary search in each one of them, which leads to a $\text{[math]}$ for answering one query.

You can also start at the bottom, like in the iterative version of the segment tree, but you also only use the arrays in the $\text{[math]}$ nodes that form a partition.

Original_Batman's blog