Блог пользователя Pranayhalo

Автор Pranayhalo, история, 4 года назад, По-английски

Wondering if there are any good resources (or editorials) to aid when trying to upsolve regional ACM-ICPC contests. I see they have coded solutions, but without some editorial, I am not able to fully understand why what works. I am specifically asking for the Mid-Central PC 2018. If not, would anyone be able to help me in solving the following problem from MCPC18 K: Repeated Substrings? Thanks!

  • Проголосовать: нравится
  • +6
  • Проголосовать: не нравится

»
4 года назад, # |
  Проголосовать: нравится +11 Проголосовать: не нравится

Compute the LCP array of the string and find the index of the maximum value. Code

Another possible solution is to use binary search on substring length + hashing.

  • »
    »
    4 года назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    I understand and have coded the LCP solution now, thanks! I am not sure if I fully understand the alternative solution. I understand how you can binary search (if a length of k is valid, then all lengths 1...k is valid; if a length of k is not valid, then all lengths k...n is not valid). If I want to check if some length of k is valid, how would this be done efficiently? There are n-k+1 substrings of length k so O(n) substrings of length k, and we just add all of these substrings to a set and if we ever add a duplicate, we know that length k is valid? I am not sure how efficient a set is for strings (is it O(n^2) for n insertions as each insertion takes O(n) time since we must compare the strings one at a time?). I think this is where hashing comes in, but I am not sure how to use this correctly.

    • »
      »
      »
      4 года назад, # ^ |
        Проголосовать: нравится +3 Проголосовать: не нравится

      Yes, if you insert the entire string into the set then you will get O(n^2). You will need to do something faster. If you fix the substring length as k, then as you said there are n-k+1 strings of length k. Now, instead of putting the strings themselves into the set, you can put the hashes of the strings into the set. If the set contains duplicates then you know (with high probability) that a length-k substring has appeared twice.

      Now, the important question is how to compute the hash of s[i..j] efficiently. Assume the strings are 1-indexed First, we're going to need to define a convenient hash function:

      $$$h[i] = (s[1]p^1 + s[2]p^2 + ... + s[i]p^i) \text{ mod } M$$$

      Where p is a small prime and M is a big prime (p=101,M=10^9+7 should be fine). Also for convenience h[0] = 0. Now we also have:

      $$$\text{ hash of }s[i..j] = \frac{(h[j] - h[i - 1])}{ p^{i - 1}}$$$

      Where the right hand side is evaluated mod M. Lastly, it is possible that there are some hash-collisions. In this case, you may use a combination of two primes and two moduli to make the collisions less likely.