Interesting inequality issue when inserting into/splitting a treap

#	User	Rating
1	tourist	3690
2	jiangly	3647
3	Benq	3581
4	orzdevinwang	3570
5	Geothermal	3569
5	cnnfls_csy	3569
7	Radewoosh	3509
8	ecnerwala	3486
9	jqdai0815	3474
10	gyh20	3447

#	User	Contrib.
1	maomao90	174
2	awoo	165
3	adamant	161
4	TheScrasse	160
5	nor	158
6	maroonrk	156
7	-is-this-fft-	152
8	orz	146
8	SecondThread	146
10	pajenegod	145

Hi, I was trying to solve 702F — T-Shirts. These are four interesting submissions:

Use the "Compare" tool on CF to see the diff between these submissions and notice for yourself that the changes are only in the sign of inequality of split_by_val and insert_by_val! In fact, the AC is in <500ms whereas TLE is on 4000ms, quite significant difference!

I do not understand why the signs of inequality actually matter. Here are my thoughts:

Splitting by value and insertion by value should be able to operate independent of each other. The only prerequisite for both of them is that the treap should be BST ordered, which it always is. Therefore, the correctness of split should not depend on whether insert uses < or <= (and vice-versa).
The only reason it may TLE is because of unbalancedness in the treap. But for the purpose of balancing we are using a random priority which should rebalance the treap everytime we insert by value.

So, either this an actual theoretical issue in treaps that I have not seen mentioned anywhere before (unlikely), or I have made another error in implementation due to which this issue shows up (likely) [Although I have cross checked my implementation with my friend's] Regardless, I don't understand why it TLEs!

Please help me understand the issue, thanks!

Comments (6)

Write comment?

AnimeshSinha1309

3 years ago, # |

+23

Assume that all the elements (the keys) in your tree are equal. Now, when you insert a new instance of the same element, assume that you get a priority that is higher than any other in the tree, i.e., this node will be inserted at the top (due to the random number generator).

Now, when you have different signs, a split results in all the elements going into the left subtree. Now any new insertion with lower priority will never lower the depth of this left subtree, all new insertions will go to the right, and fix that subtree. And new insertions with higher priority will skew the tree more in the same way that this insertion did.

Those low-priority insertions were doing the job of fixing the damage caused by these high-priority insertions (which result in a split at the top). You misdirected those resources, your amortization no longer works, leading to a TLE.

Hope the general idea is clear, the specifics of the case and bounds on time can be worked out but will require more involved analysis.

→ Reply

sigma_g

3 years ago, # ^ |

Thanks! IIUC, you're claiming that when all the elements of a treap are equal, the expected height of the treap will be $$$N/2$$$, rather than the $$$\log N$$$ we desire, leading to a TLE. Why so? Because — as you say — there is a 50% chance (based on random priority) that the newly inserted element will increase the depth of the tree. And we are never lowering the depth. So for N insertions the expected depth increase is N/2.

I am not sure why this happens iff the signs are unequal. Specifically, when exactly is the depth of the treap is reduced? I will try to hand-draw some scenarios and see the result.

Also I'm not sure how the theoretical proof of a treap's expected height adapt to account for such scenario (initialize empty treap and then do several insertions of equal value). Actually I realize I have never read any proof, but still, I'm curious.

Benq

+13

Regarding the third submission, if you insert the same value $$$N$$$ times you should notice that the treap nodes form a chain of length $$$N$$$ regardless of the priorities.

https://ide.usaco.guide/MdMquqkUdqDO6ni7q1e

bicsi

+11

Treap complexity is based on randomized quicksort complexity. Quicksort degrnerates to $$$O(n^2)$$$ when all elements are equal, therefore treaps behave the same. The only real solution is to make sure that the keys are unique (for example, by breaking ties by comparing node ids/pointers).

It reminds me of some romanian popular opinion: ‘only idiots think that quicksort is $$$O(n^2)$$$ lol’

+14

"Quicksort degrnerates to $$$\mathcal{O}(n^2)$$$, when all elements are equal, therefore treaps behave the same"

Not really though? For example, let us insert $$$N$$$ equal elements into a treap. I take the first submission from my codes linked above.

When you run it (link), its maximum depth is roughly $$$2\log_2 N$$$. I have run it multiple times locally, and it's always the same result.

We see that treaps work well, even for all equal elements. So, I do not understand what $$$\mathcal{O}(n^2)$$$ degeneration you are discussing about.

+21

I figured the solution to this issue eventually. It's an issue in the implementation and I'm still surprised nobody has written a blog post about this or encountered it before.

First of all, the TLE only happens when all keys in the treap are equal. If keys in the treap are distinct, then only one unique treap exists and it would be of logarithmic height due to randomly chosen priorities (due to results as proved in the original treaps paper).

When the keys are equal, regardless of the randomization, we will end up with an order $$$N$$$ height treap iff we use an incorrect implementation. This is primarily an issue with insert, the split by itself is doing fine. Now I shall explain why.

When inserting a new node in a treap, the randomized priority helps us decide only the depth till which this new node would be inserted. With a high priority, this new node may become the root node of the treap. With a lower priority, it may be inserted at depth 3 (moving existing elements there into its subtree). With the lowest priority, it would become a leaf node. Thus, we can imagine the following procedure insert(it) [it = a newly constructed node] as follows:

Start at the root node of the treap
Keep traversing down the treap till you find the right spot to insert it (based on priority values): at each step, moving to left child or right child
Once at the right spot, split the subtree at that spot into l and r
Set it->l = l and it->r = r, and put it in that spot instead

Let us now focus on the first TLE submission, and note that the split method in that submission moves all (equal) elements to r (leaving l empty). Therefore, when inserting it and executing step 2 of the above procedure, we would want to do the traversal carefully. We should prefer traversing to the left child as we are likely to find more empty spots in that direction. Filling an empty spot is preferable because we are not increasing the depth of the tree. For example, if the left subtree has height 2 and right subtree has height 5, then filling an empty spot on the left child does not increase the total height of the treap.

If we keep traversing to the right, however, we would be increasing the depth of the treap, by one for every insertion. This is because we are putting a new node in the right subtree and then moving all (equal) elements to the right as well (due to the split). You can show inductively — starting from an empty treap — that every new insertion results in a right skewed treap that increases its height by one each iteration.

That's about it for the explanation! It is a bit wordy but I hope that if you run a few simulations with small treaps you would understand the issue.

The solution to this is to stress test your treaps template against large equal value insertions and ensure it has logarithmic height. This post also makes me wonder what other method combinations are there in treaps that are susceptible to TLE. It would be great if anyone would do analysis on them too.

Cheers!

sigma_g's blog