**The objective of this post is NOT understanding FFT or Karatsuba, or any algorithm that performs fast convolutions, it's about understanding one of its uses**

Hi Codeforces community! I wanted to write this blog entry to help coders who would like to see a nice example of convolutions used in a problem which, at first sight, has little to no relation with polynomial multiplication (of course it has, it's a convolution after all)

The problem is as follows: You are given two strings *S* and *P*, with lengths *n* and *m* respectively, where *m* ≤ *n*

For each substring *T* of *S* with length *m*, we have to find the one that maximizes the number of positions with the same character, i. e *T*[*i*] = *P*[*i*]

For the sake of simplicity, let us assume that our strings consists only of letters *a* and *b*. For example, if *S* = *baabab* and *P* = *bba*, let's calculate the "similarity" between *P* and each substring of *S* of length 3.

*s*(*baa*, *bba*) = 2

*s*(*aab*, *bba*) = 0

*s*(*aba*, *bba*) = 2

*s*(*bab*, *bba*) = 1

We can see that there are two substrings which maximize the similarity (any answer will be fine)

**A naive approach has a O(n·m) time complexity and uses O(n + m) memory**

This should be OK if 1 ≤ *n*, *m* ≤ 1000. What about 1 ≤ *n*, *m* ≤ 10^{5}? We have to find a faster way to apply for every substring *T* of *S*

First, let's try to change the formula above with a more mathematical approach. If we map each character {a,b}={0,1}, we can apply the next formula:

. This will give us the number of 1's (originally, *b*'s) that match.

Let *S*^{c} be the complement of *S* (0 becomes 1, and viceversa). To compute the number of 0's that match, we apply

**Now what? This doesn't improve the complexity at all**

Let's remember how the convolution *f* of two discrete functions *g* and *h* is defined:

**Now we're talking! This looks pretty similar to the formulas previously explained.**

Next we will try to change our formulas to a simple convolution. First we will say that our array *S* and *P* are two functions *g* and *h* whose domain is [0, *n* - 1] and range is {0,1}.

**But the formula is incorrect, this convolution, for each x applies the similarity formula, but reversing P!**

This is our first issue, which we can easily solve by reversing *P* ourselves. Let's call *P*_{r} the reversed version of *P*. We can see that

is equivalent to

Our second issue is, *m* ≤ *n*. We can easily solve this issue (again) by adding some trailing zerores to *P*_{r}, which is equivalent to add leading zeroes to *P* until *m* = *n*

**But, what is f(x) exactly, and how can this help us to improve the time complexity?**

We saw that , which is the same as taking a preffix of *S* and **(this is the tricky part)** a suffix of *P*, each of length *x* + 1

If we take *x* = *m* - 1, we can see that this applies the similarity function to the whole pattern *P* and the first substring of *S* which starts at 0. If we take *x* > *m* - 1 then we apply the function to *P* and the (*x* - *m* + 1) - *th* substring of *S*. And if we take *x* < *m* - 1 we apply the function to a substring which goes out of the bounds of *S*.

Let's check our example *S* = *baabab* and *P* = *bba*. First, let's map each character *S* = 100101 and *P* = 110.

Then,*P*_{r} = 011 and let's add some trailing zeroes: *P*_{r} = 011000.

Let's calculate the convolution *f*

*f*(0) = 0

*f*(1) = 1

*f*(2) = 1

*f*(3) = 0

*f*(4) = 1

*f*(5) = 1

Great! *f* computes the number of 1'*s* that match. What about the number of 0'*s*? Let's apply the same procedure to *S*^{c} and *P*^{c}

*S*^{c} = 011010 and *P*^{c} = 001

*P*_{r}^{c} = 100000

*f*^{c}(0) = 0

*f*^{c}(1) = 1

*f*^{c}(2) = 1

*f*^{c}(3) = 0

*f*^{c}(4) = 1

*f*^{c}(5) = 0

**Fine! Now, the answer should be** *max* *f*(*i*) + *f*^{c}(*i*) **where** *i* **is in** [*m* - 1, *n* - 1]

With this approach, we can solve the problem with a *O*(*nlogn*) time complexity using an algorithm like FFT to perform the convolution, or maybe Karatsuba algorithm

**And what if the alphabet size is not 2?**

For each character on the alphabet, we apply the convolution. Suppose *f*_{ai} is the convolution between *S* and *P* where, if *S*[*i*] = = *a*_{i} then *S*[*i*] = 1, else *S*[*i*] = 0 (the same applies for *P*), for every character *a*_{i} on the alphabet *a*

The answer should be *max* with *j* in [*m* - 1, *n* - 1]

This solution has a *O*(|*a*|·*nlogn*) time complexity, where |a| is the size of the alphabet

**And what now?**

Here are some problems which can be solved using this approach

Hope you like this post, and helps you solving some convolution problems!

Auto comment: topic has been updated by snacache (previous revision, new revision, compare).Thank you for this nice article, i really like convolution problems :)

Anyway, the paragraph that explains when the alphabet is not 2, i think it should be with j [m-1,n-1] without the complement part, since we are summing over all the alphabet.

You're totally right! Thanks, fixed!

Great!

Thank you for this article! I've seen a lot of string matching problems for which the solution was to utilize some sort of convolution, and I had no idea where that came in and how it applied. This cleared a lot of things for me. Thanks again!