Any help on this problem? (String Hashing)

→ Pay attention

Before contest
Codeforces Round 968 (Div. 2)
33:12:23
Register now »

*has extra registration

→ Streams

Atcoder ABC368 Solution Discussion

By aryanc403

Before stream 08:22:23

Codeforces Round 968 Solution Discussion

By aryanc403

Before stream 35:17:23

View all →

→ Top rated

#	User	Rating
1	tourist	3947
2	jiangly	3734
3	Radewoosh	3646
4	jqdai0815	3620
4	Benq	3620
6	orzdevinwang	3612
7	ecnerwala	3581
8	Geothermal	3569
8	cnnfls_csy	3569
10	ksun48	3479

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	awoo	162
2	maomao90	160
3	nor	156
4	cry	155
4	adamant	155
4	atcoder_official	155
4	-is-this-fft-	155
8	maroonrk	153
9	Petr	146
10	SecondThread	144

View all →

→ Find user

→ Recent actions

Detailed →

PedroCastillo's blog

Any help on this problem? (String Hashing)

By PedroCastillo, history, 6 years ago, In English

Hi, I'm attempting this problem with string hashing.

It works well for small inputs but gives wrong answer on very large inputs.

Any help on what could be wrong?

Problem : https://codeforces.com/contest/271/problem/D

Submission: https://codeforces.com/contest/271/submission/46239564

PedroCastillo
6 years ago
8

Comments (8)

Write comment?

Volpe

6 years ago, # |

I think you just need to use double hashing to avoid collision .

→ Reply

PedroCastillo

6 years ago, # ^ |

How exactly?

Also, how can I tell I need to use double hashing? I mean, when is it necessary?

Furthermore, I've seen solutions to this problem with just one normal hashing :(

→ Reply

Volpe

6 years ago, # ^ |

I mean with double hahsing is to use two hash values for the string with two different base and MOD values .

In general you can't tell when will a single hash solution will pass the test cases for a problem as the collision happens with a probability and you can't tell if your solution will collide or not but you can reduce the probability of collision as much as you can .

You can calculate this probabilty by assuming that the hash values will be uniformly distrubted over the different values of strings so as much as you increase the value of the MOD you will gain more probability of getting ACC (less probability of collision) or by using double hashing for solutions based on rolling hash in your case .

→ Reply

PedroCastillo

6 years ago, # ^ |

← Rev. 2 →

Thanks, it worked. However, how could I tell I needed the double hashing before submitting?

→ Reply

Noam527

6 years ago, # ^ |

You don't need to detect when you should use 2 or more hashes. One could say you should do according to your intuition, but I suggest always using multiple hashes, depending on how memory and time consuming it is to build this many hashes. Say, 2 or 3 is the usual amount I use.

→ Reply

CodingKnight

6 years ago, # |

← Rev. 7 →

The following is an accepted solution based on collision-free substring hashing. The main idea is to enumerate small letters between a and z as integers between 0 and M - 1, where M = 26. Then, up to P consecutive symbols in the string are packed in a single integer as digits of a base-M integer using iterated multiplication and addition without overflow, and P = 13 for a 64-bit signed integer. The sequence of integers generated from packing a substring represents a collision-free hash key for all substrings with the same length. A two-dimensional array of hash-key sets is used to store the distinct keys generated from all substrings in the input string, where the first index represents the number of bad letters in the substring and the second index represents the length of the substring. It is guaranteed that two substrings are different if the number of bad letters they contain are different or their lengths are different. In other words, all substrings stored in one item of the two-dimensional array have the same number of bad letters and the same length.

46247924

UPDATE:

The following is an update for the previous solution using one-dimensional array to store the collision-free hash key (using the second index only of the previous solution, i.e. the substring length). This update improved both the execution time and memory used.

46257737

→ Reply

ILoveBitches

6 years ago, # |

← Rev. 3 →

Always Use double hashing if possible. The probability of collision in single hashing is N/MOD. While Using double hashing the probability of collision becomes (N*N/MOD*MOD1). In case of worst case, N/MOD might become 10e-4 which will lead you to trouble. Instead while using double hashing, In the worst case, the probability of collision will remain 10e-8 at least.

→ Reply

BledDest

6 years ago, # |

I think that the birthday paradox is a convenient way to measure this: if we generate something like $\text{[math]}$ random integers from 0 to MOD - 1, the probability of collision will be somewhere near 0.5. So if you want to make a lot of string comparisons using 32-bit hashing, the probability of collision is high (and it becomes even higher assuming there are multiple tests, and you should pass all of them).

Taking two (or three) 32-bit hashes or one (or two) 64-bit hash should be enough almost in every problem.

→ Reply