Minimize string length by repeatedly removing a given substring

→ Pay attention

Before contest
Codeforces Round 940 (Div. 2) and CodeCraft-23
3 days
Register now »

*has extra registration

→ Top rated

#	User	Rating
1	ecnerwala	3648
2	Benq	3580
3	orzdevinwang	3570
4	cnnfls_csy	3569
5	Geothermal	3568
6	tourist	3565
7	maroonrk	3530
8	Radewoosh	3520
9	Um_nik	3481
10	jiangly	3467

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	maomao90	174
2	adamant	164
2	awoo	164
4	TheScrasse	160
5	nor	159
6	maroonrk	156
7	-is-this-fft-	150
8	SecondThread	147
9	orz	146
10	pajenegod	145

View all →

→ Find user

→ Recent actions

Detailed →

veryverydarkgray's blog

Minimize string length by repeatedly removing a given substring

By veryverydarkgray, 10 years ago, In English

The task is to minimize the length of resulting string after repeatedly removing a given substring. Note that after removal, new occurrences of the substring may be formed.

Example:

Given string: aaababa

String to remove: aba

Possibilities:

aaababa -> aaab
aaababa -> aaba -> a

So in this case the answer is 1.

This looks like a standard problem but i am unable to solve it. An even tougher version of the problem appeared on: https://www.hackerrank.com/contests/apc/challenges/reducto where instead of a single string, a set of strings can be removed. Sadly, an editorial is not available.

veryverydarkgray
10 years ago
2

Comments (2)

Write comment?

AlexSkidanov

10 years ago, # |

I think the following will work:

Let q be the string to remove, and S be the original string.

Let canKill(i,j) be a function that says whether the entire substring from i to j of S can be removed.

To compute it, we will use a dynamic programming. Let s be the substring from i till j, and l be the length of that string (l = j - i). canKill(i,j) is true if and only if we can find a subsequence of s that is equal to q, such that all the characters that are not in that subsequence can be fully covered by a set of non-intersecting intervals, for each of which canKill is true.

We will use a dynamic programming for that. d_x, y is whether we can find a subsequence of a prefix of s of length x, that is a prefix of q of length y. The transitions are

d_x, y = d_{x - 1, y - 1} if q_y - 1 = s_x - 1

(we add another character from q to the end of the prefix)

and

d_x, y = d_{x - z, y} if canKill(i + x — z, i + x)

(we add another interval for which canKill is true to the end of the previx)

When you compute all the canKills, you can run another dynamic programming run, that will find the maximum subset of S that can be covered by non-intersecting intervals, for which canKill is true.

→ Reply

veryverydarkgray

10 years ago, # ^ |

← Rev. 3 →

Thanks a lot!

Initially i was unable to understand how you calculate canKill(i,j) from its corresponding dp value — d_{len(s),len(q)} . Suppose s1=aabbcc , s2=ababcc and q=abc. For both s1 and s2, q is a subsequence so d_{len(s),len(q)} is true for both but canKill is only true for s2.

However, if we consider d[x][y] as true if and only if s_x (prefix of s of length x) can be completely removed by using q_y atmost once and the complete string q multiple times, then canKill(i,j) becomes equal to d[len(s)][len(q)] as it denotes whether the whole substring s can be completely removed by using string q multiple times.

Suppose s=ababcabcc and q=abc. So, at this stage canKill(2,4), canKill(5,7), canKill(2,7) is known to be true.

Now for given s the dp looks like (the transitions are same, only rows and columns are interchanged):

        a   b   a   b   c   a   b   c   c

a       1   0   0   0   0   0   0   0   0

ab      0   1   0   0   1   0   0   1   0

abc     0   0   0   0   0   0   0   0   1

Since dp[2][8] is true, so canKill(s) is also true. Is this what you meant or i misunderstood your approach?

→ Reply