Even more efficient but not so easy segment trees

→ Pay attention

Before contest
Codeforces Round 941 (Div. 1)
3 days
Register now »

*has extra registration

Before contest
Codeforces Round 941 (Div. 2)
3 days
Register now »

*has extra registration

→ Top rated

#	User	Rating
1	ecnerwala	3649
2	Benq	3581
3	orzdevinwang	3570
4	Geothermal	3569
4	cnnfls_csy	3569
6	tourist	3565
7	maroonrk	3531
8	Radewoosh	3521
9	Um_nik	3482
10	jiangly	3468

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	162
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	151
8	SecondThread	147
9	orz	146
10	pajenegod	145

View all →

→ Find user

→ Recent actions

Detailed →

sslotin's blog

Even more efficient but not so easy segment trees

By sslotin, 2 years ago, In English

https://en.algorithmica.org/hpc/data-structures/segment-trees/

I wrote a SIMD-friendly segment tree ("wide segment tree") that is up to 10x faster than the Fenwick tree and the bottom-up segment tree:

The article also explains in detail how all the other popular segment tree implementations work and how to optimize them (fun fact: the Fenwick tree can be made 3x faster on large arrays if you insert "holes" in the array layout and make it ~0.1% larger). The article is long but hopefully accessible to beginners.

I only focused on the prefix sum case (partially to make the comparison with the Fenwick tree fair). While ~~I have some ideas on generalizing it to more complex operations, and I will probably add~~ I've added a separate section on implementing other operations, I highly encourage the community to try to efficiently implement mass assignment, RMQ, lazy propagation, persistency, and other common segment tree operations using wide segment trees.

+260

sslotin
2 years ago
7

Comments (7)

Write comment?

Kyou_mo_kawaii

2 years ago, # |

In terms of generalizations, atcoder's interface is very flexible and popular: https://atcoder.github.io/ac-library/master/document_en/segtree.html It would be great if you can make your wide segment tree a drop in replacement targeting it.

In particular, it doesn't assume the operation is commutative, so it must accumulate front and back in the right order: https://github.com/atcoder/ac-library/blob/master/atcoder/segtree.hpp#L45 I think this ruins some of the optimizations you made but there should still be benefits to using the higher branching factor?

→ Reply

sslotin

2 years ago, # ^ |

I don't think you can make an entirely drop-in replacement of the atcoder's segment tree as it requires users to write their own (scalar) reducers, while most wide segment tree operations need to be implemented with SIMD intrinsics.

That said, maybe it is possible to make a partial specialization on std::plus, std::bit_xor, std::multiplies, std::min, and other common binary operations, but the library users would have to actually pass them instead of writing their own code.

P. S. Added a section with ideas on how to implement RMQ, general reductions, and lazy propagation.

→ Reply

nor

2 years ago, # ^ |

I agree. However, I just wanted to point out that such an approach might be simplified by doing what eve does (not the platform-agnostic parts, but the abstraction parts). For instance, templating based on block sizes (different block sizes would be needed for different int types, for instance) would be nice.

→ Reply