veryverydarkgray's blog

By veryverydarkgray, 8 years ago, In English

The task is to minimize the length of resulting string after repeatedly removing a given substring. Note that after removal, new occurrences of the substring may be formed.


Given string: aaababa

String to remove: aba


  1. aaababa -> aaab

  2. aaababa -> aaba -> a

So in this case the answer is 1.

This looks like a standard problem but i am unable to solve it. An even tougher version of the problem appeared on: where instead of a single string, a set of strings can be removed. Sadly, an editorial is not available.

  • Vote: I like it
  • +10
  • Vote: I do not like it

8 years ago, # |
  Vote: I like it +2 Vote: I do not like it

I think the following will work:

Let q be the string to remove, and S be the original string.

Let canKill(i,j) be a function that says whether the entire substring from i to j of S can be removed.

To compute it, we will use a dynamic programming. Let s be the substring from i till j, and l be the length of that string (l = j - i). canKill(i,j) is true if and only if we can find a subsequence of s that is equal to q, such that all the characters that are not in that subsequence can be fully covered by a set of non-intersecting intervals, for each of which canKill is true.

We will use a dynamic programming for that. dx, y is whether we can find a subsequence of a prefix of s of length x, that is a prefix of q of length y. The transitions are

dx, y = dx - 1, y - 1 if qy - 1 = sx - 1

(we add another character from q to the end of the prefix)


dx, y = dx - z, y if canKill(i + x — z, i + x)

(we add another interval for which canKill is true to the end of the previx)

When you compute all the canKills, you can run another dynamic programming run, that will find the maximum subset of S that can be covered by non-intersecting intervals, for which canKill is true.

  • »
    8 years ago, # ^ |
    Rev. 3   Vote: I like it 0 Vote: I do not like it

    Thanks a lot!

    Initially i was unable to understand how you calculate canKill(i,j) from its corresponding dp value — dlen(s),len(q) . Suppose s1=aabbcc , s2=ababcc and q=abc. For both s1 and s2, q is a subsequence so dlen(s),len(q) is true for both but canKill is only true for s2.

    However, if we consider d[x][y] as true if and only if sx (prefix of s of length x) can be completely removed by using qy atmost once and the complete string q multiple times, then canKill(i,j) becomes equal to d[len(s)][len(q)] as it denotes whether the whole substring s can be completely removed by using string q multiple times.

    Suppose s=ababcabcc and q=abc. So, at this stage canKill(2,4), canKill(5,7), canKill(2,7) is known to be true.

    Now for given s the dp looks like (the transitions are same, only rows and columns are interchanged):

            a   b   a   b   c   a   b   c   c
    a       1   0   0   0   0   0   0   0   0
    ab      0   1   0   0   1   0   0   1   0
    abc     0   0   0   0   0   0   0   0   1

    Since dp[2][8] is true, so canKill(s) is also true. Is this what you meant or i misunderstood your approach?