How to hash? - Codeforces

→ Обратите внимание

До соревнования
Codeforces Round 946 (Div. 3)
09:15:25
Зарегистрироваться »

→ Трансляции

Codeforces Round 946 Solution Discussion

aryanc403

До начала 12:10:25

Всё →

→ Лидеры (рейтинг)

№	Пользователь	Рейтинг
1	tourist	3690
2	jiangly	3647
3	Benq	3581
4	orzdevinwang	3570
5	Geothermal	3569
5	cnnfls_csy	3569
7	Radewoosh	3509
8	ecnerwala	3486
9	jqdai0815	3474
10	gyh20	3447

Страны | Города | Организации

Всё →

→ Лидеры (вклад)

№	Пользователь	Вклад
1	maomao90	173
2	awoo	164
3	adamant	163
4	TheScrasse	159
5	nor	157
6	maroonrk	156
7	-is-this-fft-	152
8	Petr	146
8	orz	146
10	pajenegod	145

Всё →

→ Найти пользователя

→ Прямой эфир

Детальнее →

Блог пользователя larush

How to hash?

Автор larush, история, 6 месяцев назад, По-английски

Hello

So I was solving this problem (here), and a hashing-based solution seemed obvious to me. It is enough to look at the hash of the changed substring and get the final hash of the modified string using the prefix and suffix. I do not know much about hashing, just a little about polynomial (or rolling) hash. But it seems that my hash isn't safe enough. I tried a few things, but with no luck:

Spoiler

I used the standard primes M = 1e9+9 and p = 31 initially, as I learned from cp-algo. I have no clue what I'm doing wrong.

I hope someone could assist me in this and help me understand hashing a little better. Cheers!

off by one

larush
6 месяцев назад
8

Комментарии (8)

Написать комментарий?

ishwarendra

6 месяцев назад, # |

← Rev. 2 →

You can only improve probability of correct answer using hashing.

You can check SlavicG submission. He used $$$3$$$ different combination of $$$3$$$ $$$M$$$ (mod) values and $$$4$$$ different $$$p$$$ values to improve his probability of getting correct output.

You can also checkout his other submission during contest which failed to pass since they used only $$$1$$$ value of $$$M$$$ and $$$p$$$. submission-1 submission-2

UPD: You can checkout this blog for more information on multi-hashing and how to write hard to hack hashing code (and also how to hack poorly written hashes).

→ Ответить

pajenegod

6 месяцев назад, # ^ |

I do not think that the SlavicG submission you linked is safe from hacks. I've hacked many similar rolling hash solutions like that in that past. The reason it is hackable is that

All of the bases used are constant (non-random).
The prime mods used are small. The time complexity for a typical hash hack algorithm is O(sqrt(largest prime)).

In my opinion, the easiest solution to be safe from hacks when using rolling hashes is to use a single large prime like $$$P=2^{61} - 1$$$, and then use a random base (pick it uniformly at random in [0, P)). But whichever option you choose to go with, do not make the base constant (non-random).

Just a final note. SlavicG technically does randomize the bases using shuffle(all(plsnohack2), rng);, but there are only 24 different possible outcomes. So his choice of bases is effectively deterministic.

→ Ответить

vgtcross

6 месяцев назад, # |

← Rev. 3 →

But it seems that my hash isn't safe enough.

Since all of your hashes are integers in range $$$[0, M)$$$, there are $$$M$$$ different possible hashes. Your code will work incorrectly if you encounter a hash collision, i.e. a case where two different strings hash to the same hash value.

In the problem you linked, you're comparing up to $$$2\cdot10^5$$$ hashes with each other. According to the birthday paradox you only need around $$$\sqrt{M}$$$ different hashes before it's expected that some two are the same. Since $$$2\cdot10^5$$$ is much larger than $$$\sqrt{M} \approx 31622$$$, it's very likely that you'll encounter a hash collision.

The easiest way to deal with this is to choose two different pairs of (multiplier, modulus) and calculate the hash with both; now the hash is the pair of these hashes. Now the number of hashes is on the order of $$$M^2$$$ and we'd expect a collision only once we have around $$$M$$$ strings, so two hashes should be good enough. Of course you can use more to be safe but it's probably not required.

→ Ответить

the_hyp0cr1t3

6 месяцев назад, # ^ |

+10

Adding to this, in most* problems it usually suffices to have $$$2^{64}$$$ as the second modulus for ease of implementation (something like 231564057).

→ Ответить

chromate00

6 месяцев назад, # ^ |

← Rev. 3 →

+35

$$$2^{64}$$$ is a very weak modulus; It is widely known that the Thue-Morse sequence will hack any modulus in the form of $$$2^k$$$. How does it help when used alongside a prime modulus?

UPD: It really doesn't help so much

→ Ответить