Substring Hash and its Fast Computation

#	User	Rating
1	ecnerwala	3649
2	Benq	3581
3	orzdevinwang	3570
4	Geothermal	3569
4	cnnfls_csy	3569
6	tourist	3565
7	maroonrk	3531
8	Radewoosh	3521
9	Um_nik	3482
10	jiangly	3468

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	163
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	151
8	SecondThread	147
9	orz	146
10	pajenegod	145

I was reading substring hash from this link. But I got confused in the bold lines below. Could anyone explain me with example?

Suppose we are given a string S, and given indices I and J. It is required to find the hash from the substring S [I..J].

By definition, we have:

H [I..J] = S [I] + S [I + 1] * P + S [I + 2] * P ^ 2 + ... + S [J] * P ^ (J-I)

where from:

H [I..J] * P [I] = S [I] * P [I] + ... + S [J] * P [J],
H [I..J] * P [I] = H [0..J] - H [0..I-1]

The only problem that arises is what you need to be able to divide by P [I]. In fact, it is not so simple. Since we compute the hash modulo 2 ^ 64, for division by P [I] we must find the inverse element to it in the field (for example, using the Euclidean Advanced Algorithm ), and multiply by this inverse element.

However, there is an easier way. In most cases, instead of dividing hashes by degrees P, you can, conversely, multiply them by these degrees.

Suppose you have two hashes: one multiplied by P [I], and the other by P[J]. If I <J, then multiply the first hash by P[J-I], otherwise, multiply the second hash by P [I-J]. Now we have brought hashes to one degree, and we can safely compare them.

Comments (4)

Show archived | Write comment?

mujtaba1747

4 years ago, # |

Ok, I understand that using modulo 2^64 is unsafe as the inverse might not exist for all numbers but it will surely exist if you use modulo (a large prime number) 1e9 + 9 for instance ....

→ Reply

-8

Your idea is great, but sometimes we might have to store the hashes, in that case we would need to take inverse right...

For eg : We needed to find the number of distinct substrings of a given string.

4 years ago, # ^ |

Otherwise for each hash you'd have to store the power with which it was multiplied and bring all the hashes to that same level during storing and comparision ...

Gassa

With example:

Let $$$S = S_1S_2S_3S_4S_5S_6S_7$$$.
We want to compare $$$S_1 S_2 S_3$$$ and $$$S_5 S_6 S_7$$$.
We have $$$h (S_1 S_2 S_3) = S_1 \cdot p^1 + S_2 \cdot p^2 + S_3 \cdot p^3$$$.
And we have $$$h (S_5 S_6 S_7) = S_5 \cdot p^5 + S_6 \cdot p^6 + S_7 \cdot p^7$$$.
To compare them, we can *multiply the first one by $$$p^4$$$, getting $$$S_1 \cdot p^5 + S_2 \cdot p^6 + S_3 \cdot p^7$$$.

Now we can compare the numbers
$$$S_1 \cdot p^5 + S_2 \cdot p^6 + S_3 \cdot p^7$$$
and
$$$S_5 \cdot p^5 + S_6 \cdot p^6 + S_7 \cdot p^7$$$
modulo $$$q$$$ directly, to see whether $$$S_1 S_2 S_3$$$ and $$$S_5 S_6 S_7$$$ are "probably equal".

Does it make more sense with this example?

1507002's blog