Hello!

Today I will present a data structure that I have worked on for my Bachelor thesis a while ago, called (by me) a PreTree (link to thesis). This is a type of "compressed" trie that is arguably easier to implement than a regular compressed trie, and it achieves merging in amortized $$$O(N L)$$$ complexity for $$$N$$$ keys of equal length $$$L$$$. I will present an implementation as a segment tree, though, as it is easier to understand.

**Prerequisite: know segment trees, and "sparse" segment trees (a.k.a. bitwise tries).**

### Main idea

We will keep a regular segment tree, with the twist that *the root contains the minimum key*. Basically, a PreTree on keys in the universe $$$[b, e)$$$ is a binary tree $$$T$$$ where either $$$T$$$ is empty, or the following conditions hold:

- $$$T$$$ holds the lowest key present in the range $$$[b, e)$$$
- $$$T$$$'s left son is a PreTree on values $$$[b, \frac{b + e}{2})$$$
- $$$T$$$'s right son is a PreTree on values $$$[\frac{b + e}{2}, e)$$$

Because $$$T$$$ is a one-to-one mapping between tree vertices and keys, then the memory complexity of $$$T$$$ is obviously $$$O(N)$$$.

The reason for the name PreTree comes from it being a "pretty tree" (of course, beauty is subjective), and also because their pre-order traversals yield keys in sorted order.

### Implementation

In order to make the implementation easier, we will define a single big operation called "merging" two PreTrees. Suppose we have two trees $$$T1$$$ and $$$T2$$$. Merging the two trees would be done as the following recursive procedure:

```
function Merge(T1, T2, b, e):
if T1 is empty then return T2
if T2 is empty then return T1
m = (b + e) / 2
if T1.k > T2.k then swap T1 and T2
T1.l = Merge(T1.l, T2.l, b, m)
T1.r = Merge(T1.r, T2.r, m, e)
T2.l = T2.r = empty
if T1.k != T2.k:
## We have to insert T2 into the corresponding child
if T2.k >= m:
T1.r = Merge(T1.r, T2, m, e)
else:
T1.l = Merge(T1.l, T2, b, m)
return T1
```

#### Insertion

In order to insert a key into the tree, just merge it with a singleton tree.

#### Erase

This is a bit more involved. Notice, however, that any PreTree $$$T$$$ is a min-heap on the keys. Find the key and then pop it as you would do in a heap.

#### Predecessors

In order to find the greatest key smaller than a given $$$k$$$, one can do the following:

```
function Smaller(T, k):
if T is empty or T.k >= k:
return NOT_FOUND
ret = Smaller(T.r, k)
if ret is NOT_FOUND:
ret = Smaller(T.l, k)
if ret is NOT_FOUND:
ret = T
return ret
```

Finding the smallest key greater than or equal to a given $$$k$$$ is possible, albeit a little more complicated. The simplest way would be to also keep the maximum on the subtree. I will omit the actual implementation.

### Performance

The `Merge`

function seems not too efficient at first glance. However, the amortized complexity of it is bounded by $$$O(N log(U))$$$, where $$$U$$$ is the size of the universe, and $$$N$$$ is the total number of nodes. In order to see why is that, we will have to look at the depths of each node in the tree.

Let's suppose $$$T1$$$ and $$$T2$$$ have distinct keys. Each non-trivial call to the `Merge`

function increases the depth of at least one vertex. The reason that is true is because, after each call, the root node with the higher key will get "demoted" one level downwards, therefore increasing its depth by one. As depths of a node can never exceed $$$O(log(U))$$$, the total complexity cannot exceed $$$O(N log(U))$$$.

*Follow-up 1: Prove that this still holds for non-distinct keys!*

*Follow-up 2: What about erase operations? How do they affect the amortized analysis?*

*Follow-up 3: What about (Treap) splitting operations? Can you split T into two trees, depending on some (monotonic) predicate? How does that affect complexity?*

### Applications

One of the most standard applications of this is to improve the $$$O(N log^2(N))$$$ bound on various problems involving the small-to-large trick. It has the additional improvement of having $$$O(N)$$$ memory without making the implementation a lot harder.

Example problems:

- 343D. Water Tree | AC Submission (here we can implicitly encode the keys in the nodes' value, yielding a very elegant and short implementation)
- B. Segment Tree for the Minimum | AC Submission (this is an example where you could use this representation as a regular segment tree)

Could you tell me about some other problems involving some form of sparse segment trees and other tasks where this could come handy, to add them to the blog?