Easy and (Semi)Efficient Dynamic Segment Trees (with Policy Hash Tables)

#	User	Rating
1	ecnerwala	3648
2	Benq	3580
3	orzdevinwang	3570
4	cnnfls_csy	3569
5	Geothermal	3568
6	tourist	3565
7	maroonrk	3530
8	Radewoosh	3520
9	Um_nik	3481
10	jiangly	3467

#	User	Contrib.
1	maomao90	174
2	adamant	164
2	awoo	164
4	TheScrasse	160
5	nor	159
6	maroonrk	156
7	SecondThread	150
8	-is-this-fft-	148
9	pajenegod	145
10	BledDest	144

One of my favorite implementations of segment trees has always been "Easy and Efficient Segment Trees, by Al.Cash. I used to dread segtree problems, but after reading that blog post and adapting a super simple implementation I've gotten a lot better with them. However, there are some types of segtree that you can't implement in that fashion, namely dynamic segtrees and persistent segtrees. See here for criticism. With the advent of policy hash tables, however, one can now implement dynamic segtrees in Al.Cash's style with somewhat comparable performance to a custom dynamic segtree.

Standard segtree

This is how a standard segtree looks like. You can set a single element, and query for ranges. It's nice and simple, and I think it's a great implementation.

int N;
int seg[2 * MAXN];

void modify(int p, int val) {
    for (seg[p += N] = val; p > 0; p >>= 1)
        seg[p >> 1] = seg[p] + seg[p ^ 1];
}

int query(int l, int r) {
    int res = 0;
    for (l += N, r += N; l < r; l >>= 1, r >>= 1) {
        if (l & 1)
            res += seg[l++];
        if (r & 1)
            res += seg[--r];
    }
    return res;
}

Dynamic Segtree

However, say your underlying array had 1e9 possible locations, but it only contained 1e5 elements. For example, take a look at this post. Obviously, you can't store all 2e9 elements in your segtree, so what should you do? Here's one solution, replace the array with a hash table. However, as adamant mentions, unordered_map has too much overhead. We'll be benchmarking against the dynamic segtree provided here. I'll also be using a custom hash function. So to be clear, the implementation now looks like:

Code

int N;
unordered_map<int, int, chash> seg;

void modify(int p, int val) {
    for (seg[p += N] = val; p > 0; p >>= 1)
        seg[p >> 1] = seg[p] + seg[p ^ 1];
}

int query(int l, int r) {
    int res = 0;
    for (l += N, r += N; l < r; l >>= 1, r >>= 1) {
        if (l & 1)
            res += seg[l++];
        if (r & 1)
            res += seg[--r];
    }
    return res;
}

And benchmarking it with 1e5 random insertions and 1e5 random queries.

pointer: 0.171485
unordered_map: 2.0646

Wow. The unordered_map is nearly 12x slower. That's not really feasible for a lot of contests. What if we replace it with a policy hash table, though?

Code

int N;
unordered_map<int, int, chash> seg;

void modify(int p, int val) {
    for (seg[p += N] = val; p > 0; p >>= 1)
        seg[p >> 1] = seg[p] + seg[p ^ 1];
}

int query(int l, int r) {
    int res = 0;
    for (l += N, r += N; l < r; l >>= 1, r >>= 1) {
        if (l & 1)
            res += seg[l++];
        if (r & 1)
            res += seg[--r];
    }
    return res;
}

pointer: 0.202186
policy hash table: 0.384312

Only a 2x decrease in speed. That's already very feasible. However, one might notice that since maps in C++ create elements if you try to access a key that doesn't exist, we're creating a lot of useless elements. Thus, we can simply wrap a check to make sure the element is in the array before we try to access it

EDIT: Updated with dalex's optimization.

gp_hash_table<ll, ll, chash> seg;

ll get(ll x) { return (seg.find(x) == seg.end()) ? 0 : seg[x]; }
void modify(ll p, ll val) {
    for (seg[p += N] = val; p > 0; p >>= 1) {
        seg[p >> 1] = get(p) + get(p ^ 1);
    }
}

ll query(ll l, ll r) {
    ll res = 0;
    for (l += N, r += N; l < r; l >>= 1, r >>= 1) {
        if (l & 1)
            res += get(l++);
        if (r & 1)
            res += get(--r);
    }
    return res;
}

Results (averaged over twenty runs): 2e5 insertions and 2e5 queries ~~~~~ pointer: 0.44085 policy hash table: 0.57878 ~~~~~ 1e5 insertions and 1e5 queries ~~~~~ pointer: 0.19855 policy hash table: 0.29467 ~~~~~ 1e4 insertions and 1e4 queries ~~~~~ pointer: 0.014 policy hash table: 0.027 ~~~~~

So while we're nearly twice as slow with 1e4 elements and 1e4 queries, we're actually only 30% slower with 2e5 insertions and 2e5 queries.

One more thing. While I'm giving numbers like "30% slower", that's a little bit misleading. If we break down the numbers between insertion/querying, we see this:

2e5 insertions and 2e5 queries Queries: ~~~~~ pointer: 0.41625 policy hash table: 0.15627 ~~~~~ Insertions: ~~~~~ pointer: 0.1367 policy hash table: 0.42619 ~~~~~

1e4 insertions and 1e4 queries Queries: ~~~~~ pointer : 0.094 policy hash table: 0.007 ~~~~~ Insertions: ~~~~~ pointer : 0.0045 policy hash table: 0.0191 ~~~~~

So as we see from this more granular breakdown, the Policy Hash Table implementation is actually ~3x faster at querying than the custom implementation, while the custom implementation is roughly ~3x faster at inserting elements.

TL;DR: Using policy hash tables is an extremely easy and fairly efficient method of implementing dynamic segtrees.

Rev.	By	When	Δ	Comment
en11	Chilli	2018-12-10 01:58:10	0	(published)
en10	Chilli	2018-12-10 01:57:34	16	Tiny change: ' queries\n~~~~~\np' -> ' queries\n\n~~~~~\np' (saved to drafts)
en9	Chilli	2018-12-10 01:38:15	1038
en8	Chilli	2018-12-09 10:24:32	0	(published)
en7	Chilli	2018-12-09 10:23:47	18	Tiny change: 'og/entry/19080), unorder' -> 'og/entry/18051?#comment-288074), unorder' (saved to drafts)
en6	Chilli	2018-12-09 10:21:18	212	(published)
en5	Chilli	2018-12-09 10:19:30	6
en4	Chilli	2018-07-27 06:48:45	9	Tiny change: 'tyle with comparabl' -> 'tyle with somewhat comparabl'
en3	Chilli	2018-07-27 06:48:10	7	Tiny change: '="Code">\n\n~~~~~\nint N;\n' -> '="Code">\n~~~~~\n\nint N;\n'
en2	Chilli	2018-07-27 06:46:51	91	Tiny change: 'er:AlCash]'s style w' -> 'er:AlCash] 's style w'
en1	Chilli	2018-07-26 04:18:23	4617	Initial revision (saved to drafts)

Standard segtree

Dynamic Segtree

History