Hi everyone! I wan't to share some interesting fact about how we're going to challenge almost every solution, that uses polynomial hashes modulo 2^64. We'll hack any solution, regardless of it's base (!), the only thing we need, that it's using int64 with overflows — like many coders write hashing.
Is it interesting? Welcome reading after the cut. Firstly, for the most impatient of you. Here's the source of the generator:
const int Q = 11; const int N = 1 << Q; char S[N]; for (int i = 0; i < N; i++) S[i] = 'A' + __builtin_popcount(i) % 2; // this function in g++ returns // number of ones in binary representation of number i
Let's try solutions of two CFBR #7 winners on this test: Petr Mitrichev's and Vlad Epifanov's. vepifanov solution doesn't contain hashing and so, it works correct: the answer is 6. But Petr's solution returns 8. After a little bit thinking it becomes clear, that the answer on such test for this problem is always (N + 1) / 2 — Vlad is 100% correct.
Moreover, if we take Q = 20, then Vlad's solution returns correct answer 11, but Petr's one returns 2055, what is obliviously wrong :-)
I'll prove, that starting from Q = 11, there are lots of collisions in this string.
Let's look on that string. How was it formed? It's beggining like that:
It's true, that it can be formed by recurrent rule
S -> S + (not S), starting from с
S = "A", where (
not S) means string after changing
B and vice versa. Let's denote S for fixed Q like SQ.
Let's remember, what polynomial has is. It's function of string S of length l, equal to , where P is some odd number.
I claim that hash(S[0... (2k - 1)]) for some sufficiently small k k will be equal to hash(S[(2k)... (2k + 1 - 1)]).
That means, that for Q = 10, hash(SQ) = hash(not SQ). That's very cool, because it SQ and not SQ will appear in bigger order strings many-many times because of reccurent condition.
Let's see what does condition hash(SQ) = hash(not SQ) mean.First, we can take zeros and ones in coefficients instead of
ord('B') — we can just take off the both parts of the equation.
What is hash(not SQ) — hash(SQ)? It's simple to understand, that it's
--- it's just sign-alternating sum of P powers, where signs change by similar
Let's consequentially take factors in this expression off the brackets:
(maybe, it's multiplied on (-1), but there is no matter for it.)
Now the main thing — this value modulo 2^64 will become zero very-very fast!
Let's understand, what's the maximal power of 2 this value is divisable. Let's look on each of Q - 1 factors. (i + 1)-st factor P2i + 1 - 1 = (P2i - 1)(P2i + 1) is divisible by i-th and by some even number P2i + 1. That means that if i-th bracket is divisible by 2r, then (i + 1)-st is divisible at least on 2r + 1.
So that means that (P1 - 1)(P2 - 1)(P4 - 1)...(P2Q - 1 - 1) is divisible at least by 2·22·23·... = 2Q(Q — 1) / 2. That means, that it's enough to take Q >= 12. Congratulations, this is anti-hash test!
So because of that we have such small test length in comparsion with the modulo size 2^64. So antitest size is something of order , if we use x-bit data type.
Main idea: don't use overflowing when counting hashes unless you are confident that there is no test, consisting of this ABBABAABBAABABBA... string.
How did I get that? First using of that test was in 2003 on St. Petersburg school programming contest in problem cubes (russian statements here). This problem was used in SIS) problemset. Many generations of students got WA27 on that problem, submitting hash solution. One of them was I — nobody could explain me, why is there WA for any hash base. burunduk1 looked a bit on that test, but couldn't explain me either. Since that moment I remembered about that problem.
And now I've decided to think a bit and understand, what's happening in that test. burunduk1 offered to post it on CF, so here it is :-)
I tried to found any information about anti-hash in web, googled a lot, but couldn't find anything. Does anybody know anything else, maybe, any papers? Maybe I'm not so good in googling?