### dragonslayerintraining's blog

By dragonslayerintraining, history, 13 months ago, , # Question

Is there a data structure that supports $n$ of the following operations in $O(n\log{n})$?

1. Create a set containing a single integer in the range $[1,n]$.
2. Create the union of two sets, destroying the original sets in the process. The original sets may have overlapping ranges.
3. For any integer $x$ and a set, split the set into two sets: one with all elements less than $x$ and one with all elements greater than or equal to $x$.
4. For any integer $x$ and a set, add $x$ to all elements in the set. This operation is only allowed if all elements will remain in the range $[1,n]$.
5. Count the number of elements in a set.

# Partial Solutions

If operation 2 required the ranges to be disjoint, any balanced BST that supports split and join in $O(\log{n})$ will work.

Without operation 3, join-based tree algorithms will work since they support union in $O(m\log{\frac{n}{m}+1})$.

(Note that this is better than small to large, which takes $O(n\log^2{n})$ time total.)

Without operation 4, a version of segment trees will work.

# Full Solution

I was able to prove that binary search trees will solve the problem in $O(n\log^2{n})$. (Was this "well known" before?)

If we merge using split and join, this bound is tight. A set can be constructed by merging the set of even-index element and odd-index elements, which will take $O(n\log{n})$. Those two sequences could have been constructed the same way recursively. This will take $O(n\log^2{n})$ time for $O(n)$ operations.

However, I conjecture that if we use join-based tree algorithms, it will actually be $O(n\log{n})$ in the worst case. (For example, in the case above, merges take $O(n)$.)

Can anyone prove or disprove this conjecture for any join-based tree algorithm?

If anyone can prove it for splay trees instead, or have another solution, I would also be interested.

Thanks. Comments (11)
 » Auto comment: topic has been updated by dragonslayerintraining (previous revision, new revision, compare).
 » Check this
•  » » How does that handle operation 4?
•  » » 13 months ago, # ^ | ← Rev. 3 →   Interesting discussion, but I don't see anything there that answers my question.However, if I'm not mistaken, the dynamic finger theorem also implies that merging splay trees is at least as fast as join-based tree algorithms (in the long run). Proof?After a merge, divide elements into maximal contiguous blocks that can from the same source. Consider the sum of logarithms of sizes of the blocks. By the dynamic finger theorem, this is an upper bound on the runtime for splay trees. I believe it is both an upper and lower bound for join-based tree algorithms.I just realized the implication of merging join-based tree algorithms being work-optimal is that splay trees can't beat them (since it keeps everything sorted). So either all of these algorithms are $O(n\log{n})$ or they all aren't. Never mind.retrograd believes the argument for segment trees taking $O(n\log{n})$ also works for treaps, but I don't see how that works (even if ignoring operation 4).
 » proof for splay trees(?)"Starting with a collection of $n$ singleton sets, we show that any sequence of $n - 1$ meld operations can be carried out in $O(n \log n)$ time to produce a single ordered set of at most $n$ items."
•  » » From the preview, it looks like the paper doesn't consider split operations.
 » The $O(n \log n^2)$ bound is somewhat know from the Educational Codeforces Round 35. Back then, I gave a similar proof here. It is definitely less know than the sparse segment tree technique which doesn't work with operation 4. Another downside of these sparse segment trees is that they use $O(n \log n)$ memory ($O(n)$ memory is possible if you only store nodes with an even number of children, but then the implementation gets tricky. This is even less known.).In the paper "Design of Data Structures for Mergeable Trees", the similar (but harder) problem of maintaining heap ordered dynamic trees is discussed and the same bounds are achieved ($O(n \log n)$ for only merges and $O(n \log n^2)$ for merges + splits). They also conjectures that splay trees achieve $O(n \log n)$ runtime while allowing splits.Join-based merging with treaps needs $\Omega(n \log n^2)$ time on the following instance: Let $n = k^2$ and consider the $k$ sets $\{1, \dots, k\}$ , $\{k+1, \dots, 2\cdot k\}, \dots, \{n-k+1, \dots, n\}$. If we perform the worst-case sequence for split-join mering described in the blog on these sets, then we have $O(k)$ merge operations resulting in $\Omega(k \log k)$ split operations inside the join-based merge. Each of these split operations operates on a tree of size at least $k$, so every such split takes $\Omega(\log k)$ time. Hence we have $O(k)$ merge operations taking $\Omega(k \log k^2)$ in total. After merging all the sets, we perform $O(k)$ split operations to get back to the initial $k$ sets. Repeating this (2) and (3) $k$ times gives us $O(k^2) = O(n)$ operations running in $\Omega(k^2 \log k^2) = \Omega(n \log n^2)$ time. This instance shows that finger search properties on their own are not sufficient for achieving $O(n \log n)$ time. Splay trees only use $O(n \log n)$ time on this class of instances. I suspect that weighting nodes by the distance to their neighbours in the set also allows you to achieve $O(n \log n)$ runtime.Benchmark showing the $\log n^2$ behaviour for join-based merge with treaps. The benchmark counts recursive calls to split, join and merge (instead of using wall time) to reduce the noise from cache effects.
•  » » That's exactly what I was looking for. Thanks!
•  » » $\log n^2 = 2 \log n$. You mean $\log^2 n$, right?
•  » » » $\log^2 n = \log \log n$. You mean $(\log n)^2$, right?
•  » » » » Yes, I mean $(\log n)^2$, but $F^2 = F\; F$ with functions is an insult to mathematics. Exponentiation is repeated multiplication, not repeated function composition!