harsh-apcr's blog

By harsh-apcr, history, 7 months ago, In English

Can someone explain how to solve Required Substring problem in CSES : https://cses.fi/problemset/task/1112

Your task is to calculate the number of strings of length n having a given pattern of length m as their substring. All strings consist of characters A–Z.

My approach :

I was thinking of some dp approach and instead of counting the required quantity directly, I'd count the complement and subtract from $$$26^n$$$

But, I am stuck with this approach, I thought of the following recursion

$$$dp(i, j) = # of strings of length i not containing s, whose suffix of length i is a prefix of s (i.e. s_1\ldots s_j)$$$

$$$dp(i, j) = \sum_{j'\in S} dp(i-1, j') + dp(i-1, j-1)$$$, where $$$S$$$ = set of all $$$j'$$$ such that $$$s_1 \ldots s_{j'}$$$ has a prefix of length $$$j-1$$$ which is also its suffix

and then finally compute : $$$dp(i) = 26*dp(i-1)-dp(i-1,m-1)$$$ (supposed to count strings of length i not containing s)

but I cannot figure out how to instantiate the base case $$$dp(i, 0)$$$ for $$$i > m$$$ or $$$dp(i, 1)$$$

Any help is appreciated, thank you

  • Vote: I like it
  • 0
  • Vote: I do not like it

»
7 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by harsh-apcr (previous revision, new revision, compare).

»
7 months ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

I would go for something like this: Lets define Q(i) to be a string with the pattern as substring at the i-th position. Now all you have to do is to count the sum of variations possible for the i-th Q(i) if we only replace the characters not belonging to the pattern. It's more like a math problem if you ask me (combinatorics). Sorry if I had made a mistake or you didn't understand my suggestion.

  • »
    »
    6 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Sorry for late reply, how do you count Q(i) ? I don't see immediately see a direct way of counting it, could you explain ?

    • »
      »
      »
      6 months ago, # ^ |
      Rev. 2   Vote: I like it 0 Vote: I do not like it

      We need to count the number of strings which have the pattern as a substring. So the strings would have the given form: s1+PATTERN+s2 (+ concatenation). Now we just need to count the different ways we can make s1 and s2 using the letters A-Z, depending on their size. Now lets see how their size depends on Q(i).

      let:

      the indexes begin from 1

      m1=|s1| m2=|s2| (|string|=size of string)

      m = |PATTERN| n = |the whole string|

      then for each Q(i)

      => m1 = (i-1) +1 (from index 1 to the first index of the pattern)

      => m2 = n-m1-m

      Now lets see how many unique strings we can make with size k and 26 unique characters. For the first position we have 26 options, for the second 26 and for the i position 26 options. Then we multiple like such: 26*26*26... }k times = 26^k (you can think of it like picking a lock, the number of different combinations can be computed similarly)

      Now we just compute for each Q(i): (26^m1)*(26^m2) and add up the result.

      And that's our answer.

»
3 months ago, # |
  Vote: I like it 0 Vote: I do not like it

You could follow this dp idea of finding the complement and then the answer would be 26^n — complement. So you make a dp(i,j), meaning how many string from i to n doesn't have the pattern, knowing that the previous j-1 letters already matched with the pattern. The transition would be trying any of the 26 uppercase leetters as the new letter in the index i, but if j == m -1 (m is the size of the pattern), that means you can't choose the last letter of the pattern, cause that would result in us having the pattern on the string. When you reach i == n, that means you have a full string without the pattern, so you return 1. The difficulty lays on the fact that when pattern[j] doesn't match with the letter you chose to put on your string, you can't just say that j will be zero.

Example:

Imaging the pattern = ababc, and j = 4, meaning you already matched 'abab' and you know you can't choose 'c' cause that would make the whole pattern appear on the string, but if you choose 'a', now we have 'ababa', and the suffix 'aba' already matches with the three first letters of the pattern, so j will be 3. So you use the kmp algorithm to discover what would me the next j.