unordered_map vs map in C++ — Complexity Analysis

#	User	Rating
1	tourist	3880
2	jiangly	3669
3	ecnerwala	3654
4	Benq	3627
5	orzdevinwang	3612
6	Geothermal	3569
6	cnnfls_csy	3569
8	jqdai0815	3532
9	Radewoosh	3522
10	gyh20	3447

#	User	Contrib.
1	awoo	161
1	maomao90	161
3	adamant	156
4	maroonrk	153
5	-is-this-fft-	148
5	atcoder_official	148
5	SecondThread	148
8	Petr	147
9	nor	144
10	TheScrasse	142

Today one of my myths was broken.
I used to believe that unordered_map is better in time-complexity than map in C++. But today while I was doing a problem(Molly's Chemicals), I got time-limit exceeded. After a lot of guess-work(because I thought my solution was correct), I tried using a map instead of an unordered_map and as a surprise I got it accepted. Then I realised that though the amortised time-complexity of using an unordered_map is O(1) while that of a map is O(log n), there are cases in which due to a lot of collisions unordered_map can have a big constant multiplier which will increase its actual complexity to greater than that of a map.
(Any corrections are welcomed...)

Can someone enlighten me more by clearing the doubt that where should unordered_map be used and where not?
What are the cases in which a map would perform better than an unordered_map?

Here are the tle and aced solution links if someone wants to verify:
TLE Solution(TLE after 2500 ms)
Accepted Solution(Accepted in 499 ms)

Comments (17)

Show archived | Write comment?

krismaz

7 years ago, # |

+23

Firstly, unordered_map with default settings has a pretty large constant. It can be decreased a lot by calling reserve and max_load_factor methods as explained at the end of this blog. I think without this unordered_map is slightly faster than map, with this it should be much faster — assuming random input.

Secondly, in this problem test were used that make the c++ unordered_map implementation really slow. This is possible because the hash function for int/long long in C++ is identity (and then taking this modulo the number of buckets), so one can force every key in the hashmap to have the same hash, and then a single lookup becomes O(n).

To fix the second you can paste your own hash implementation like this: 24950229 (unfortunately didn't do this during the contest, got TLE just like you). This submission also xor's the hashmap key with a randomly drawn number so someone couldn't reverse-engineer your hash somehow and hack it.

→ Reply

saar2119

7 years ago, # ^ |

Thanks, the link to the blog was useful :D

unused

← Rev. 3 →

Actually identity only in GCC. I don't know how std::hash works in clang, but MSVC hashes byte by byte and safe from here.

dreameeer

6 years ago, # ^ |

+10

Should we use these two lines always? Or need to change depending on any constraint or input size?

reserve(4096); // always 4096 ?? 
max_load_factor(0.25); always 0.25 ??

The first line means that you're reserving space upfront for 4096 elements — this would be a waste of time if your hashmap ends up being smaller than that. When my hashmap was expected to be much larger, I was still using 4096, although I never really thought about it deeply.

The max_load_factor of f roughly says that the maps could take $\text{[math]}$ times more memory, but should be faster. Hence, this should be only used if memory is not a problem, and 0.25 is a sensible default value. I sometimes use lower values like 0.1 when I'm really optimizing for time, but it seems decreasing f further gives diminishing returns.

xsc

VS 2010 unordered_map very fast 24960324

gorbunov

unordered_map's amortized time complexity bound is not specified. Only average time complexity is said to be constant for search, insertion and removal. Source.

Note: if amortized bound would also be constant, the solution utilizing unordered_map would have passed.

yassin_

← Rev. 2 →

-8

Asymptotic complexity can still hide huge constants, so your note is wrong.

Based on practical evidence, the constant factor built into unordered_map is not that big. The problem is not in the constant factor, but in the fact that worst-case time complexity for a simple implementation of hashtable is O(N) for basic operations. There were times when programmers knew how hashtables are implemented, because they were implementing them on their own. At that times, it was well-known that hashtables may struggle from clustering causing them to perform as poor as linked lists.

collares

That is frequently implicit when discussing asymptotic complexity. Succinct, not wrong :)

The point about amortized complexity is much more interesting, in my opinion: If you were to do n operations on a splay tree, for example, a single operation taking O(n) time wouldn't be worrisome because the full sequence of operations would still take O(n log n) time; you already "paid" for the costly operation before it happened.

Since I am already writing this post, I guess I should say that the expression "big constant multiplier" in the original post is slightly wrong: There could be a large asymptotic slowdown (from O(1) to O(n)) in the case of collisions, it's not just a constant multiplier.