I have a string of length n. I could have queries like query(i, j) which should return the hash of the sub-string starting at i and ending at j inclusive. This query should be answered in O(1). And a preprocessing of o(n) or o(n log n) is allowed initially.
Can anyone give me idea how to do such thing?
Apparently, it's problem from a live contest. It wouldn't be appropriate for me to answer.
This is related to a problem of the ongoing codechef june challenge CHSTR . This might be a pure coincidence, but please do not ask questions pertaining to live contest problems.
I'm sorry then. I didn't know it was.
What are you talking about? Substring hash is very well-known and basic algorithm, it is OK to talk about it despite coincidence with any ongoing contest. Will you try to ban any discussion of segment tree if it will be used in some problems from Long challenge?
I was just putting forth my views. It seemed a bit odd to me that what this post is asking is really similar to what the problem asks. It could, or couldn't be a coincidence.
And, this guy is, indeed, attempting the very same problem: Submission History
Actually it isn't really similar.
I can do it with O(1) preprocessing time: Define a hash function which hashes every string to 0. Then the hash-function is given by: f(i, j) = 0.
You can have a polynomial hash of all prefixes. That is, you store:
This is the most common hashing function and you can calculate all F[i] from 1 to N in O(N) time. Then suppose we want to get H(i,j) that returns the hash of the sequence in interval [i; j]. Well we have
Now it's easy to see that:
So finally, we can subtract that from F[j] and we get the proper hash. The final answer, if I'm not mistaken, is
Having all F[] calculated and keeping a precomputed array with the powers of B will give you O(N) preprocessing time and O(1) query time.
Deleted
Try this problem 7D - Палиндромность from Codeforces Beta Round 7.
Using the technique Enchom mentioned, it could be solved in O(N). Try it :)
Edit : Problem -> 514C - Уотто и механизм also uses similar idea.
Searching for such kind of explanation ! Thanks a lot Enchom
Can you focus on what values we should take as B and MOD?
As you probably know, B is your base and MOD is the modulo you work with. There are a few rules I have when choosing such values. The idea of hashing is to give you a fairly random number out of a string, and with minor changes in the string to produce very different hash number. Here are the rules I use:
1) Make sure B and MOD are both prime numbers. This helps alot in producing "random" hashes out of strings and is just generally a must.
2) Make your MOD sufficiently large. I usually use 10^9+7 or 10^9+9 as those are large known primes. The reason I use those values and not larger is because it allows multiplying without overflowing long long
3) If you have a small alphabet, make B larger than your largest value. I don't think this is necessary and I do not know if it helps at all, but I've been told it's nice.
A famous hashing technique is getting rid of MOD and using unsigned long long, as overflow will work as modulo. This is an amazing technique and works both fast and well, with the sad exception that there are found tests that break it and give alot of collisions. So while it looks like a nice alternative, in lots of online competitions anti-hash tests will exist.
And finally I want to focus on an important thing when hashing, and that is avoiding collisions. Suppose you have one hash modulo 10^9+7, and in a problem you generate 100,000 hashes of different strings, and then want to check if there are duplicates. Surprisingly, according to the birthday problem your chance of having a collision among those 100,000 hashes is actually huge. In such cases, using double or triple hashing might be a good idea, even though it might significantly slow down the runtime.