Блог пользователя drod

Автор drod, 10 лет назад, По-английски

Given two strings A and B, your task is to find a substring of A called justice string, which has the same length as B, and only has at most two characters different from B.

Both string A and B contain lowercase English letters from a to z only. And the length of these two strings is between 1 and 100000, inclusive.

  • Проголосовать: нравится
  • +5
  • Проголосовать: не нравится

»
10 лет назад, # |
  Проголосовать: нравится +17 Проголосовать: не нравится

Well, the first thing which quickly came to mind is to use FFT. (I guess it's possible to pass TL)
But while I was writing this I understood that there's a much easier solution: let's try each substring of A. Let's denote it C. Firstly, we find largest common prefix of B and C (in O(log(|B|)) using binary search and substring hashing). If it's not the whole string B then we delete this prefix and the next symbols after it from each string (in our imagination, of course) and repeat the whole procedure. Each time we find a pair of different letters, so when we find the third one we stop.

  • »
    »
    10 лет назад, # ^ |
      Проголосовать: нравится +1 Проголосовать: не нравится

    Brute force search and largest common prefix, how can I never thought about it. Thank you!

  • »
    »
    10 лет назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    However, I'm not really sure about FFT here. There was a similar problem but with only 4 possible letters instead of 26 and I had real problem with time limit.

    • »
      »
      »
      10 лет назад, # ^ |
        Проголосовать: нравится +5 Проголосовать: не нравится

      Btw, we need not 26+26 FFT. Even not 26+1 FFT. We need only 13+1 FFT. But anyway you will have troubles with TL =) Solution in 3*NlogN with hashes is much faster in work and simpler to implement.

      • »
        »
        »
        »
        10 лет назад, # ^ |
        Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

        Yeah, I'm pretty aware of that. Even using 2+1 FFTs in that problem my realisation was very slow.
        Well, there's no doubt that hashes and binary search are easier. On the other hand, FFT is much more interesting and cooler=)

  • »
    »
    10 лет назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    Can you give an advice how to solve it with FFT?

    • »
      »
      »
      10 лет назад, # ^ |
      Rev. 4   Проголосовать: нравится +3 Проголосовать: не нравится

      Consider the following (easier) problem: given binary strings A and B for each substring in A of length |B| (call it C) calculate the number of ones in B & C (binary AND of B and C).
      UPD: Oh, yeah, it's much easier to call this number as the scalar product of B and C)
      There's a hint in the rev. 1.

»
10 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Polynomial hashes? We need to check O(n) substrings. You can check number of different letters (we are interested in four cases only: 0, 1, 2 and >2) in unsing binary search. Total time is .

  • »
    »
    10 лет назад, # ^ |
    Rev. 2   Проголосовать: нравится +9 Проголосовать: не нравится

    You also can improve this solution changing hashes to Z-function for string B#A and get linear solution.

    upd: that's wrong solution, see below.

    • »
      »
      »
      10 лет назад, # ^ |
        Проголосовать: нравится +9 Проголосовать: не нравится

      How to detect a difference in two symbols using Z-function?

      • »
        »
        »
        »
        10 лет назад, # ^ |
        Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

        Oops, it works for finding the first difference but doesn't work for the second one. My mistake, solution with Z-function is wrong, but hashes still work.

        • »
          »
          »
          »
          »
          10 лет назад, # ^ |
          Rev. 3   Проголосовать: нравится 0 Проголосовать: не нравится

          I remember there was similar task, but you have to find cyclic shift s' of string s such that diff(s', s) <= 2 (or i can be wrong, maybe <= 1). Task was at Izhevsk training camp. (2008-2010?) Author of the problem was AlexSkidanov, and he has O(n) solution.

        • »
          »
          »
          »
          »
          10 лет назад, # ^ |
            Проголосовать: нравится +7 Проголосовать: не нравится

          "it works for finding the first difference but doesn't work for the second one" you can build another z-function for reverted strings. Now it's linear.

          • »
            »
            »
            »
            »
            »
            10 лет назад, # ^ |
              Проголосовать: нравится +1 Проголосовать: не нравится

            Good idea! And it works for 1 difference. But for case of 2 differences how to check substring between 2 letters that are different?

            • »
              »
              »
              »
              »
              »
              »
              10 лет назад, # ^ |
              Rev. 3   Проголосовать: нравится +9 Проголосовать: не нравится

              suffix array + Least Common Prefix in O(NLogN), or polynomial hashes in O(N).

              • »
                »
                »
                »
                »
                »
                »
                »
                10 лет назад, # ^ |
                  Проголосовать: нравится 0 Проголосовать: не нравится

                Well, Z-function + hashes is a bit weird combination, but it gives indeed linear solution.

»
10 лет назад, # |
Rev. 2   Проголосовать: нравится +11 Проголосовать: не нравится

sorry to be off-topic, but this is blog post #12345! :)