1507002's blog

By 1507002, history, 4 years ago, In English

I was reading substring hash from this link. But I got confused in the bold lines below. Could anyone explain me with example?

Suppose we are given a string S, and given indices I and J. It is required to find the hash from the substring S [I..J].

By definition, we have:

H [I..J] = S [I] + S [I + 1] * P + S [I + 2] * P ^ 2 + ... + S [J] * P ^ (J-I)

where from:

H [I..J] * P [I] = S [I] * P [I] + ... + S [J] * P [J],
H [I..J] * P [I] = H [0..J] - H [0..I-1]

The only problem that arises is what you need to be able to divide by P [I]. In fact, it is not so simple. Since we compute the hash modulo 2 ^ 64, for division by P [I] we must find the inverse element to it in the field (for example, using the Euclidean Advanced Algorithm ), and multiply by this inverse element.

However, there is an easier way. In most cases, instead of dividing hashes by degrees P, you can, conversely, multiply them by these degrees.

Suppose you have two hashes: one multiplied by P [I], and the other by P[J]. If I <J, then multiply the first hash by P[J-I], otherwise, multiply the second hash by P [I-J]. Now we have brought hashes to one degree, and we can safely compare them.

  • Vote: I like it
  • +16
  • Vote: I do not like it

| Write comment?
»
4 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Ok, I understand that using modulo 2^64 is unsafe as the inverse might not exist for all numbers but it will surely exist if you use modulo (a large prime number) 1e9 + 9 for instance ....

»
4 years ago, # |
  Vote: I like it -8 Vote: I do not like it

Your idea is great, but sometimes we might have to store the hashes, in that case we would need to take inverse right...

For eg : We needed to find the number of distinct substrings of a given string.

  • »
    »
    4 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Otherwise for each hash you'd have to store the power with which it was multiplied and bring all the hashes to that same level during storing and comparision ...

»
4 years ago, # |
  Vote: I like it +3 Vote: I do not like it

With example:

Let $$$S = S_1S_2S_3S_4S_5S_6S_7$$$.
We want to compare $$$S_1 S_2 S_3$$$ and $$$S_5 S_6 S_7$$$.
We have $$$h (S_1 S_2 S_3) = S_1 \cdot p^1 + S_2 \cdot p^2 + S_3 \cdot p^3$$$.
And we have $$$h (S_5 S_6 S_7) = S_5 \cdot p^5 + S_6 \cdot p^6 + S_7 \cdot p^7$$$.
To compare them, we can *multiply the first one by $$$p^4$$$, getting $$$S_1 \cdot p^5 + S_2 \cdot p^6 + S_3 \cdot p^7$$$.

Now we can compare the numbers
$$$S_1 \cdot p^5 + S_2 \cdot p^6 + S_3 \cdot p^7$$$
and
$$$S_5 \cdot p^5 + S_6 \cdot p^6 + S_7 \cdot p^7$$$
modulo $$$q$$$ directly, to see whether $$$S_1 S_2 S_3$$$ and $$$S_5 S_6 S_7$$$ are "probably equal".

Does it make more sense with this example?