Phon1209's blog

By Phon1209, 13 days ago, In English,

Problem statement: There are string $$$A$$$ and set of string $$$B$$$, length of $$$A <= 100000$$$, set $$$B$$$ have 500 string elements and $$$|B_i|<=500$$$. All string consisted of the lowercase alphabet only. The task is to find longest common subsequence of string $$$A$$$ to all string $$$B_i$$$.

 
 
 
 
  • Vote: I like it  
  • -3
  • Vote: I do not like it  

»
13 days ago, # |
  Vote: I like it -17 Vote: I do not like it

i don't know much about suffix array, but i think with this data structure you can solve this problem. Good Luck

Link about how calculate LCS with suffix array -> http://lpcs.math.msu.su/~pritykin/csr2008presentations/starikovskaya.pdf

  • »
    »
    13 days ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I want to find longest common subsequence, not a substring. But thank you for helping me.

»
13 days ago, # |
Rev. 3   Vote: I like it -6 Vote: I do not like it

Time complexity can't be reduced, but you can actually reduce used memory. While calculating dp[i][j] you just need dp[i-1][j] and dp[i][j-1], so you can remeber only last row and row of dp you are currently calculating. EDIT: Nevermind, haven't noticed you've got 500 strings in array, so that definitely won't pass.

»
13 days ago, # |
  Vote: I like it +12 Vote: I do not like it

There are two problems here which are easy to confuse:

For the Longest common subsequence problem, $$$\mathcal{O}(mn)$$$ is pretty much the best you can do (apparently you can get $$$\mathcal{O}(n^2/\log n)$$$ if $$$m=n$$$, see here).

For the Longest common substring problem, you can for instance use suffix arrays (as in the link given by DougNobrega) to achieve $$$\mathcal{O}(m+n)$$$.

Are you sure your problem is asking about subsequences rather than substrings?

»
13 days ago, # |
  Vote: I like it +20 Vote: I do not like it

Note that $$$|B|=m\leq 500$$$, so the answer is not to be very large.

Let $$$f_{i,j}$$$ denote if the length of longest common subsequence is $$$i$$$, what is the minimum position of $$$k$$$ that $$$LCS(A[1..k],B[1..j])$$$ is $$$i$$$. There are only $$$O(m^2)$$$ states, and the complexity is $$$O(26n+m^2)$$$.

In your problem there are $$$500$$$ queries, so the whole complexity is $$$O(26n+500m^2)$$$.

  • »
    »
    12 days ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Can you explain it in detail? It's really like what I looking for.

    • »
      »
      »
      12 days ago, # ^ |
        Vote: I like it +10 Vote: I do not like it

      Precompute: $$$g[i][j]$$$ denotes the minimum position $$$k$$$ that $$$A[k]=j$$$ and $$$k\geq i$$$, this can be calculated in $$$O(26n)$$$.

      Start DP from $$$f[0][0]=0$$$.

      Transition Equation:

      1. $$$f[i][j]\rightarrow f[i][j+1]$$$

      2. $$$g[f[i][j]+1][B[j+1]]\rightarrow f[i+1][j+1]$$$

»
12 days ago, # |
  Vote: I like it 0 Vote: I do not like it

Do you have a link to the problem?

»
12 days ago, # |
  Vote: I like it +5 Vote: I do not like it

Use a bitset to maintain the differentiated array of each row in the trivial O(n * m) DP and you'll get an O(n * m / w) algorithm. (w = 32 or 64 usually)

(Sorry that I've got the description of this optimization in Chinese :(