A 15x faster std::set

→ Pay attention

Before contest
Codeforces Round 941 (Div. 1)
17:31:40
Register now »

*has extra registration

Before contest
Codeforces Round 941 (Div. 2)
17:31:40
Register now »

*has extra registration

→ Streams

Atcoder ABC #351 Short Solution Discussion

By aryanc403

Before stream 16:36:40

View all →

→ Top rated

#	User	Rating
1	ecnerwala	3649
2	Benq	3581
3	orzdevinwang	3570
4	Geothermal	3569
4	cnnfls_csy	3569
6	tourist	3565
7	maroonrk	3531
8	Radewoosh	3521
9	Um_nik	3482
10	jiangly	3468

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	161
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	152
8	SecondThread	147
9	orz	146
10	pajenegod	145

View all →

→ Find user

→ Recent actions

Detailed →

sslotin's blog

A 15x faster std::set

By sslotin, history, 2 years ago, In English

https://en.algorithmica.org/hpc/data-structures/b-tree/

As promised, I wrote a new tree data structure.

+159

sslotin
2 years ago
7

Comments (6)

Show archived | Write comment?

Xellos

2 years ago, # |

I see you're using SIMD, how does it perform without SIMD?

→ Reply

sslotin

2 years ago, # ^ |

I just tried to replace rank32 with a scalar binary search within a node (although my binary search is a bit unusual): it is worse but still ~5x faster than std::set (and still beats Abseil by ~1.5x).

→ Reply

nikgaevoy

2 years ago, # |

+59

As far as I understand, your structure in fact works with ints only, so it would be more fair to compare it with structures, designed specifically for ints rather than with std::set<int>. So, how does it perform in comparison to vEB tree or X/Y-fast trie or maybe something else?

→ Reply

sslotin

2 years ago, # ^ |

← Rev. 3 →

I don't know for sure since I have not compared against them, but I think these particular structures won't be faster even for large arrays. They both perform roughly $$$\log \log 2^{32} = 5$$$ iterations — same as B-trees for all reasonable dataset sizes, — but they are probably slower, at least in their basic implementations:

The vEB tree uses recursion and branching, which is expensive.
The x-fast trie uses hash tables, which is even more expensive.

On top of that, they are not storing the data that efficiently, which is bad for the cache performance.

That said, I think that some universe-reducing approaches may be faster when properly optimized — at least for large enough dataset sizes. Implementing an associative array this way is one of the things I'm going to try next.

→ Reply

nikgaevoy

2 years ago, # ^ |

Well, that's interesting.

Yesterday, after I posted a comment, I decided to check out fastest solutions for this problem and it seems like all the best solutions implement some kind of the vEB tree.

I am not sure whether it is possible to compile your data structure on their server or not, but all solutions there are public anyway, so it should be still easy to compare your solution with these implementations at least on random tests.

→ Reply

sslotin

2 years ago, # ^ |

I can't submit it to that problem because I haven't implemented deletions yet, but I took nor's fastest submission there and plugged it into my benchmark — and I was wrong: it outperforms mine when $$$N$$$ is larger than $$$10^6$$$ or so, but the B-tree is ~3x faster for $$$N = 10^5$$$ and anything under that.

(That's for the searches: insertions lose much sooner, but they aren't really optimized in the B-tree to begin with.)

→ Reply