I_love_natalia's blog

By I_love_natalia, 5 years ago, translation, In English,

After a small discussion, we decided to publish the generator that causes #TLE of HashSet and HashMap in Java.

It is here — http://pastebin.com/qpyNcD3R

The main idea: let's force the hash collections to put all the items in one bucket.

A feature of HashSet and HashMap is a usage of a special linear hash transformation, and then use a reminder hash % bucketsCount as bucket number.

This is some quotes from original Java code:

    static int hash(int h) {
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

    static int indexFor(int h, int length) { 
        return h & (length-1); /*length is a power of 2 --- me*/
    }

The only difficulty is to generate an inverse transform — it is here:

    int hashinv(int h) {
        h ^= (h >>> 4) ^ (h >>> 7) ^ (h >>> 8) ^ (h >>> 14) ^ (h >>> 15)
                ^ (h >>> 18) ^ (h >>> 19) ^ (h >>> 20) ^ (h >>> 21)
                ^ (h >>> 23) ^ (h >>> 26) ^ (h >>> 28);
        return h;
    }

And then, just type the numbers in a such way:

    final int size = 100000;
    int[] s = new int[size];
    for (int i = 0, val = 0; i < size; i++) {
        s[i] = Integer.MAX_VALUE;
        while (s[i] > 1000000000)
            s[i] = hashinv(bitreverse(val++));
    }

Here, bitreverse inverses last 31 bits in number.

This works for Java 6 и Java 7 with almost all standard input transformation (sorting, random shuffling, etc).

Good luck in hacks!

 
 
 
 
  • Vote: I like it  
  • +79
  • Vote: I do not like it  

»
5 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Well, then the question arises.. what should I use then?!

  • »
    »
    5 years ago, # ^ |
      Vote: I like it +9 Vote: I do not like it

    Hmm... TreeMap and compareTo instead of HashMap and hashCode?

»
4 years ago, # |
  Vote: I like it 0 Vote: I do not like it

It seems to take a lot of time to find enough numbers less than 100000 while it barely take time to find enough numbers less than 1000000000.

»
4 years ago, # |
  Vote: I like it 0 Vote: I do not like it

What about if I change the load factor?
As I know bucketsCount would be different and the code for hacking should be different (I guess).

  • »
    »
    4 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    No, if a == b, then (a == b) mod M for every M. Look, first the hash is calculated as shown above and the operation % is used by module bucketsCount. The first phase doesn't work.