Блог пользователя Phon1209

Автор Phon1209, 5 лет назад, По-английски

Problem statement: There are string $$$A$$$ and set of string $$$B$$$, length of $$$A <= 100000$$$, set $$$B$$$ have 500 string elements and $$$|B_i|<=500$$$. All string consisted of the lowercase alphabet only. The task is to find longest common subsequence of string $$$A$$$ to all string $$$B_i$$$.

  • Проголосовать: нравится
  • +2
  • Проголосовать: не нравится

»
5 лет назад, # |
  Проголосовать: нравится -17 Проголосовать: не нравится

i don't know much about suffix array, but i think with this data structure you can solve this problem. Good Luck

Link about how calculate LCS with suffix array -> http://lpcs.math.msu.su/~pritykin/csr2008presentations/starikovskaya.pdf

»
5 лет назад, # |
Rev. 3   Проголосовать: нравится -6 Проголосовать: не нравится

Time complexity can't be reduced, but you can actually reduce used memory. While calculating dp[i][j] you just need dp[i-1][j] and dp[i][j-1], so you can remeber only last row and row of dp you are currently calculating. EDIT: Nevermind, haven't noticed you've got 500 strings in array, so that definitely won't pass.

»
5 лет назад, # |
  Проголосовать: нравится +12 Проголосовать: не нравится

There are two problems here which are easy to confuse:

For the Longest common subsequence problem, $$$\mathcal{O}(mn)$$$ is pretty much the best you can do (apparently you can get $$$\mathcal{O}(n^2/\log n)$$$ if $$$m=n$$$, see here).

For the Longest common substring problem, you can for instance use suffix arrays (as in the link given by DougNobrega) to achieve $$$\mathcal{O}(m+n)$$$.

Are you sure your problem is asking about subsequences rather than substrings?

»
5 лет назад, # |
  Проголосовать: нравится +20 Проголосовать: не нравится

Note that $$$|B|=m\leq 500$$$, so the answer is not to be very large.

Let $$$f_{i,j}$$$ denote if the length of longest common subsequence is $$$i$$$, what is the minimum position of $$$k$$$ that $$$LCS(A[1..k],B[1..j])$$$ is $$$i$$$. There are only $$$O(m^2)$$$ states, and the complexity is $$$O(26n+m^2)$$$.

In your problem there are $$$500$$$ queries, so the whole complexity is $$$O(26n+500m^2)$$$.

  • »
    »
    5 лет назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    Can you explain it in detail? It's really like what I looking for.

    • »
      »
      »
      5 лет назад, # ^ |
        Проголосовать: нравится +10 Проголосовать: не нравится

      Precompute: $$$g[i][j]$$$ denotes the minimum position $$$k$$$ that $$$A[k]=j$$$ and $$$k\geq i$$$, this can be calculated in $$$O(26n)$$$.

      Start DP from $$$f[0][0]=0$$$.

      Transition Equation:

      1. $$$f[i][j]\rightarrow f[i][j+1]$$$

      2. $$$g[f[i][j]+1][B[j+1]]\rightarrow f[i+1][j+1]$$$

»
5 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Do you have a link to the problem?

»
5 лет назад, # |
  Проголосовать: нравится +5 Проголосовать: не нравится

Use a bitset to maintain the differentiated array of each row in the trivial O(n * m) DP and you'll get an O(n * m / w) algorithm. (w = 32 or 64 usually)

(Sorry that I've got the description of this optimization in Chinese :(