saar2119's blog

By saar2119, history, 2 years ago, In English,

Today one of my myths was broken.
I used to believe that unordered_map is better in time-complexity than map in C++. But today while I was doing a problem(Molly's Chemicals), I got time-limit exceeded. After a lot of guess-work(because I thought my solution was correct), I tried using a map instead of an unordered_map and as a surprise I got it accepted. Then I realised that though the amortised time-complexity of using an unordered_map is O(1) while that of a map is O(log n), there are cases in which due to a lot of collisions unordered_map can have a big constant multiplier which will increase its actual complexity to greater than that of a map.
(Any corrections are welcomed...)

Can someone enlighten me more by clearing the doubt that where should unordered_map be used and where not?
What are the cases in which a map would perform better than an unordered_map?

Here are the tle and aced solution links if someone wants to verify:
TLE Solution(TLE after 2500 ms)
Accepted Solution(Accepted in 499 ms)

 
 
 
 
  • Vote: I like it
  • +9
  • Vote: I do not like it

»
2 years ago, # |
  Vote: I like it +23 Vote: I do not like it

Firstly, unordered_map with default settings has a pretty large constant. It can be decreased a lot by calling reserve and max_load_factor methods as explained at the end of this blog. I think without this unordered_map is slightly faster than map, with this it should be much faster — assuming random input.

Secondly, in this problem test were used that make the c++ unordered_map implementation really slow. This is possible because the hash function for int/long long in C++ is identity (and then taking this modulo the number of buckets), so one can force every key in the hashmap to have the same hash, and then a single lookup becomes O(n).

To fix the second you can paste your own hash implementation like this: 24950229 (unfortunately didn't do this during the contest, got TLE just like you). This submission also xor's the hashmap key with a randomly drawn number so someone couldn't reverse-engineer your hash somehow and hack it.

  • »
    »
    2 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Thanks, the link to the blog was useful :D

  • »
    »
    2 years ago, # ^ |
    Rev. 3   Vote: I like it 0 Vote: I do not like it

    Actually identity only in GCC. I don't know how std::hash works in clang, but MSVC hashes byte by byte and safe from here.

  • »
    »
    5 months ago, # ^ |
      Vote: I like it +10 Vote: I do not like it

    Should we use these two lines always? Or need to change depending on any constraint or input size?

    reserve(4096); // always 4096 ?? 
    max_load_factor(0.25); always 0.25 ??
    
    • »
      »
      »
      5 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      The first line means that you're reserving space upfront for 4096 elements — this would be a waste of time if your hashmap ends up being smaller than that. When my hashmap was expected to be much larger, I was still using 4096, although I never really thought about it deeply.

      The max_load_factor of f roughly says that the maps could take times more memory, but should be faster. Hence, this should be only used if memory is not a problem, and 0.25 is a sensible default value. I sometimes use lower values like 0.1 when I'm really optimizing for time, but it seems decreasing f further gives diminishing returns.

»
2 years ago, # |
  Vote: I like it 0 Vote: I do not like it

VS 2010 unordered_map very fast 24960324

»
2 years ago, # |
  Vote: I like it +9 Vote: I do not like it

unordered_map's amortized time complexity bound is not specified. Only average time complexity is said to be constant for search, insertion and removal. Source.

Note: if amortized bound would also be constant, the solution utilizing unordered_map would have passed.

  • »
    »
    2 years ago, # ^ |
    Rev. 2   Vote: I like it -8 Vote: I do not like it

    Asymptotic complexity can still hide huge constants, so your note is wrong.

    • »
      »
      »
      2 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Based on practical evidence, the constant factor built into unordered_map is not that big. The problem is not in the constant factor, but in the fact that worst-case time complexity for a simple implementation of hashtable is O(N) for basic operations. There were times when programmers knew how hashtables are implemented, because they were implementing them on their own. At that times, it was well-known that hashtables may struggle from clustering causing them to perform as poor as linked lists.

    • »
      »
      »
      2 years ago, # ^ |
        Vote: I like it +4 Vote: I do not like it

      That is frequently implicit when discussing asymptotic complexity. Succinct, not wrong :)

      The point about amortized complexity is much more interesting, in my opinion: If you were to do n operations on a splay tree, for example, a single operation taking O(n) time wouldn't be worrisome because the full sequence of operations would still take O(n log n) time; you already "paid" for the costly operation before it happened.

      Since I am already writing this post, I guess I should say that the expression "big constant multiplier" in the original post is slightly wrong: There could be a large asymptotic slowdown (from O(1) to O(n)) in the case of collisions, it's not just a constant multiplier.

»
2 years ago, # |
Rev. 17   Vote: I like it 0 Vote: I do not like it

the same thing has happened with my friend whose handle is hmrockstar

»
5 months ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

Today I faced same problem when solving 1102E - Monotonic Renumeration. Then map vs unordered_map helped me to clear my confusion. Hope it will help you.