How do I find hash of any sub-string in O(1) by O(n) preprocessing.

→ Обратите внимание

Соревнование идет
2023 Post World Finals Online ICPC Challenge powered by Huawei
11 дней
Зарегистрироваться »

До соревнования
Codeforces Round 944 (Div. 4)
14:39:35
Зарегистрироваться »

→ Лидеры (рейтинг)

№	Пользователь	Рейтинг
1	tourist	3690
2	jiangly	3647
3	Benq	3581
4	orzdevinwang	3570
5	Geothermal	3569
5	cnnfls_csy	3569
7	Radewoosh	3509
8	ecnerwala	3486
9	jqdai0815	3474
10	gyh20	3447

Страны | Города | Организации

Всё →

→ Лидеры (вклад)

№	Пользователь	Вклад
1	maomao90	174
2	awoo	164
3	adamant	163
4	TheScrasse	160
5	nor	158
6	maroonrk	156
7	-is-this-fft-	152
8	orz	146
9	pajenegod	145
9	SecondThread	145

Всё →

→ Найти пользователя

→ Прямой эфир

Детальнее →

Блог пользователя ayushmishra.iit

How do I find hash of any sub-string in O(1) by O(n) preprocessing.

Автор ayushmishra.iit, история, 9 лет назад, По-английски

I have a string of length n. I could have queries like query(i, j) which should return the hash of the sub-string starting at i and ending at j inclusive. This query should be answered in O(1). And a preprocessing of o(n) or o(n log n) is allowed initially.

Can anyone give me idea how to do such thing?

-4

ayushmishra.iit
9 лет назад
14

Комментарии (13)

Показать архивные | Написать комментарий?

determinism

9 лет назад, # |

← Rev. 2 →

Apparently, it's problem from a live contest. It wouldn't be appropriate for me to answer.

→ Ответить

gvaibhav21

9 лет назад, # |

← Rev. 3 →

-20

This is related to a problem of the ongoing codechef june challenge CHSTR . This might be a pure coincidence, but please do not ask questions pertaining to live contest problems.

→ Ответить

determinism

9 лет назад, # ^ |

I'm sorry then. I didn't know it was.

→ Ответить

adamant

9 лет назад, # ^ |

+68

What are you talking about? Substring hash is very well-known and basic algorithm, it is OK to talk about it despite coincidence with any ongoing contest. Will you try to ban any discussion of segment tree if it will be used in some problems from Long challenge?

→ Ответить

gvaibhav21

9 лет назад, # ^ |

← Rev. 2 →

-12

I was just putting forth my views. It seemed a bit odd to me that what this post is asking is really similar to what the problem asks. It could, or couldn't be a coincidence.

And, this guy is, indeed, attempting the very same problem: Submission History

→ Ответить

adamant

9 лет назад, # ^ |

Actually it isn't really similar.

→ Ответить

TimonKnigge

9 лет назад, # |

+34

I can do it with O(1) preprocessing time: Define a hash function which hashes every string to 0. Then the hash-function is given by: f(i, j) = 0.

→ Ответить

Enchom

9 лет назад, # |

← Rev. 6 →

+27

You can have a polynomial hash of all prefixes. That is, you store:

F[i] = [ a1*B^(i-1) + a2*B^(i-2) ... ai*B^0 ] % MOD

This is the most common hashing function and you can calculate all F[i] from 1 to N in O(N) time. Then suppose we want to get H(i,j) that returns the hash of the sequence in interval [i; j]. Well we have

F[j] =    [ a1*B^(j-1) + a2*B^(j-2) ... aj*B^0 ] % MOD
F[i-1] =  [ a1*B^(i-2) + a2*B^(i-3) ... a[i-1]*B^0 ] % MOD

Now it's easy to see that:

F[i-1]*B^(j-i+1) = [ a1*B^(j-1) + a2*B^(j-2) ... a[i-1]*B^(j-i+1) ] % MOD

So finally, we can subtract that from F[j] and we get the proper hash. The final answer, if I'm not mistaken, is

H(i,j) = ( F[j]-F[i-1]*B^(j-i+1) ) % MOD

Having all F[] calculated and keeping a precomputed array with the powers of B will give you O(N) preprocessing time and O(1) query time.

→ Ответить

deepak2015

9 лет назад, # ^ |

← Rev. 2 →

-11

Deleted

→ Ответить

mishraiiit

9 лет назад, # ^ |

← Rev. 2 →

Try this problem 7D - Палиндромность from Codeforces Beta Round 7.

Using the technique Enchom mentioned, it could be solved in O(N). Try it :)

Edit : Problem -> 514C - Уотто и механизм also uses similar idea.

→ Ответить

mahbubcseju

9 лет назад, # ^ |

Searching for such kind of explanation ! Thanks a lot Enchom

→ Ответить

mishraiiit

9 лет назад, # ^ |

Can you focus on what values we should take as B and MOD?

→ Ответить

Enchom

9 лет назад, # ^ |

+15

As you probably know, B is your base and MOD is the modulo you work with. There are a few rules I have when choosing such values. The idea of hashing is to give you a fairly random number out of a string, and with minor changes in the string to produce very different hash number. Here are the rules I use:

1) Make sure B and MOD are both prime numbers. This helps alot in producing "random" hashes out of strings and is just generally a must.

2) Make your MOD sufficiently large. I usually use 10^9+7 or 10^9+9 as those are large known primes. The reason I use those values and not larger is because it allows multiplying without overflowing long long

3) If you have a small alphabet, make B larger than your largest value. I don't think this is necessary and I do not know if it helps at all, but I've been told it's nice.

A famous hashing technique is getting rid of MOD and using unsigned long long, as overflow will work as modulo. This is an amazing technique and works both fast and well, with the sad exception that there are found tests that break it and give alot of collisions. So while it looks like a nice alternative, in lots of online competitions anti-hash tests will exist.

And finally I want to focus on an important thing when hashing, and that is avoiding collisions. Suppose you have one hash modulo 10^9+7, and in a problem you generate 100,000 hashes of different strings, and then want to check if there are duplicates. Surprisingly, according to the birthday problem your chance of having a collision among those 100,000 hashes is actually huge. In such cases, using double or triple hashing might be a good idea, even though it might significantly slow down the runtime.

→ Ответить

Соревнования по программированию 2.0

Время на сервере: 10.05.2024 02:55:25 (j3).

Десктопная версия, переключиться на мобильную.

При поддержке