Hi!

I thought that `unordered_set`

is faster than `set`

, because set complexity is logarithmic in size and `unordered_set`

complexity is constant time in average.(source: cplusplus.com)

But today I saw this:

TLE(>2000 MS) with `unordered_set`

: 15494816

Accepted(155 MS) with `set`

: 15494846

Also my other solution(witch I submitted during the contest) with `unordered_map`

hacked :'(

Now my question is : Is `unordered_set`

faster than `set`

?

**UPD:** I have accepted(15497282).`unordered_set`

time was better than `set`

(15494846) : 93<155.

Only with adding this line:`s.max_load_factor(0.25);s.reserve(500);`

.

**UPD2:** it seems that it is better to use a power of 2 in reserve(i.e. `s.reserve(512)`

).(Thanks from keyvankhademi)

The default hash function for integers does not perform very well. I suggest trying to look up other hash function alternatives.

In my experience, both unordered set and map have worked well for strings.

Any hash function does not perform well unless it is chosen randomly from a big set.

there's a difference between functions that work badly when someone want them too and that just work badly.

unordered_set is not always faster than set, it's like comparing complexity between quick sort and merge sort.

to be fair,

`quick sort`

is`O(n lg n)`

in average and`O(n^2)`

in worth case,but`marge sort`

is`O(n lg n)`

in average and worth case.But

`unordered_set`

isin average and`O(1)`

`O(n)`

in worth case, but`set`

is`O(lg n)`

in average and worth case.`quick sort`

and`marge sort`

average is equal, so it is better to use`marge sort`

.But because

`unordered_set`

average complexity is better than`set`

, it seems that`unordered_set`

must be better.adding pos.reserve(500); reduces running time better then tree set for the current tests: http://codeforces.com/contest/620/submission/15496964

i have no idea though why setting to a much higher value in place of 500 again increases time

Yeah, sorry for the bad intepretation. But my point still stands, unordered_set complexity is unstable so the matter of is unordered_set is faster than set depends on the test case.

Umm, no, it's better to use quicksort. Proof — this text from Skienna,

Algorithm Design Manual:Basically quicksort has better cache locality. Most C++ standard library implementations use a combination of quicksort and insertion sort as their implementation for

`std::sort`

.Well, that depends on the implementation of lib (And specially-crafted data will cause that. Check the UPD behind). Theoretically and most of the times, the hash ones are faster than tree ones.(And that's why hashing is quite popular in actual work.)

And as it turns out on Microsoft .Net Framework implements, the answer is yes.

15474905 the HashSet one with 218ms.

15496423 the SortedSet one with 311ms.

As for their implementation in Microsoft .Net Framework, it's already open-sourced. corefx/src/System.Collections/src/System/Collections/Generic/HashSet.cs corefx/src/System.Collections/src/System/Collections/Generic/SortedSet.cs

As for the GNU C++ inplementations(hash table implementations in <bits/hashtable.h>, and the default hash function for int is just static_cast(__val) ), I'm surprised to find a try-catch block in it(and not so easy to understand how it works).

And thank you for the heads up. I'll think twice before using unordered_set/unordered_map on GNU C++.

UPD:

The problem just reminds me of a famous security problem: Hash Collision DoS(a hot problem during 2011~2012), which, according to some reports, a lot of popular languages ware(maybe even are still) affected, including PHP<=5.3.8 , Perl, Java, ASP.NET, V8 JavaScript Engine,Ruby <= 1.8.7-p356 ,etc. A specially-crafted set of keys could trigger hash function collisions, which can degrade performance of HashMap or Hashtable by changing hash table operations complexity from an expected/average O(1) to the worst case O(n).

Some information here:https://bugzilla.redhat.com/show_bug.cgi?id=750533

(Some discussion shows that some language just adds some limits to the number of parameters to "hide" the problem inside the implements of hash table strategy (Source: http://stackoverflow.com/questions/8672985/how-was-the-hash-collision-issue-in-asp-net-fixed-ms11-100). You may want a try to hack my C# solution. )

Sometimes it is.

I have once got a problem AC by changing ordered_set to unordered_set in C++.

However it depends on the hashing. I always go with ordered set, and if it's tle and optimization, I try the unordered one.

Looks like someone went through the trouble to generate an extremely bad test case input for hacking 620C - Pearls in a Row. Now the hack is included in the system tests as Test 42. It causes a huge number of collisions in that specific G++ implementation of unordered map.

`unordered_map`

fails with TL: 15496737`map`

gets AC (only one line of code changed!): 15496747`unordered_map`

andcustom hash function: AC again! 15497036. Bit slower compared to map, anyway.To the OP: good idea to change the max load factor! Looks like a simple solution compared to writing a custom hash function.

Tweaking the load factor + pre-creating many buckets (with

`reserve`

) solves the problem because you only get bad performance when there are many items per bucket. The worst-case is when all items are in a single bucket: then lookup in the hash map is reduced to list lookup.Yes I'm afraid that was me.

To sum it up, unordered_set is generally much faster than set. Set gurantees O(log n) but unordered_set is O(1) in average. However while set ensures O(log n), unordered_set can take O(n) time per insertion. This is very rare in practice unless the dataset is deliberately chosen so that unordered_set causes large number of collisions, degrading to O(n).

Thanks for bringing up this issue Arpa! I've tested your code with

`s.max_load_factor(0.75)`

and`s.reserve(16)`

and it got accepted with 109ms. These are also the values used in [Java's HashSet](https://docs.oracle.com/javase/7/docs/api/java/util/HashSet.html#HashSet()).