Interesting problem involving convolutions

#	User	Rating
1	tourist	3690
2	jiangly	3647
3	Benq	3581
4	orzdevinwang	3570
5	Geothermal	3569
5	cnnfls_csy	3569
7	Radewoosh	3509
8	ecnerwala	3486
9	jqdai0815	3474
10	gyh20	3447

#	User	Contrib.
1	maomao90	174
2	awoo	165
3	adamant	161
4	TheScrasse	160
5	nor	158
6	maroonrk	156
7	-is-this-fft-	152
8	orz	146
9	SecondThread	145
9	pajenegod	145

**The objective of this post is NOT understanding FFT or Karatsuba, or any algorithm that performs fast convolutions, it's about understanding one of its uses**↵
↵
Hi Codeforces community! I wanted to write this blog entry to help coders who would like to see a nice example of convolutions used in a problem which, at first sight, has little to no relation with polynomial multiplication (of course it has, it's a convolution after all)↵
↵
The problem is as follows: You are given two strings $S$ and $P$, with lengths $n$ and $m$ respectively, where $m\leqn$↵
↵
For each substring $T$ of $S$ with length $m$, we have to find the one that maximizes the number of positions with the same character, i. e $T[i] = P[i]$↵
↵
For the sake of simplicity, let us assume that our strings consists only of letters $a$ and $b$. For example, if $S = baabab$ and $P = bba$, let's calculate the "similarity" between $P$ and each substring of $S$ of length 3.↵
↵
$s(baa,bba)=2$↵
↵
$s(aab,bba)=a$↵
↵
$s(aba,bba)=2$ ↵
↵
$s(bab,bba)=b$↵
↵
We can see that there are two substrings which maximize the similarity (any answer will be fine)↵
↵
**A naive approach has a $O(n\cdotm)$ time complexity and uses $O(n+m)$ memory**↵
↵
This should be OK if $1\leqn,m\leq1000$. What about $1\leqn,m\leq10^5$? We have to find a faster way to apply $\sum_{i=0}^{m} T[i] == P[i]$ for every substring $T$ of $S$↵
↵
First, let's try to change the formula above with a more mathematical approach. If we map each character {a,b}={0,1}, we can apply the next formula:↵
↵
$\sum_{i=0}^{m}{T[i] \cdot P[i]}$. This will give us the number of $1$'s (originally, $b$'s) that match.↵
↵
Let $S^c$ be the complement of $S$ (0 becomes 1, and viceversa). To compute the number of $0$'s that match, we apply ↵
↵
$\sum_{i=0}^{m}{T^c[i] \cdot P^c[i]}$↵
↵
**Now what? This doesn't improve the complexity at all**↵
↵
Let's remember how the convolution $f$ of two discrete functions $g$ and $h$ is defined:↵
↵
$f(x) = \sum_{i=0}^{x} {g(i) \cdot h(x-i)}$↵
↵
**Now we're talking! This looks pretty similar to the formulas previously explained.**↵
↵
Next we will try to change our formulas to a simple convolution. First we will say that our array $S$ and $P$ are two functions $g$ and $h$ whose domain is $[0,n-1]$ and range is {0,1}.↵
↵
$f(x) = \sum_{i=0}^{x} {S[i] \cdot P[x-i]}$↵
↵
**But the formula is incorrect, this convolution, for each $x$ applies the similarity formula, but reversing $P$!**↵
↵
This is our first issue, which we can easily solve by reversing $P$ ourselves. Let's call $P_r$ the reversed version of $P$. We can see that↵
↵
$f(x) = \sum_{i=0}^{x} {S[i] \cdot P_{r}[x-i]}$ is equivalent to $f(x) = \sum_{i=0}^{x} {S[i] \cdot P[i]}$↵
↵
Our second issue is, $m\leqn$. We can easily solve this issue (again) by adding some trailing zerores to $P_r$, which is equivalent to add leading zeroes to $P$ until $m = n$↵
↵
**But, what is f(x) exactly, and how can this help us to improve the time complexity?**↵
↵
We saw that $f(x) = \sum_{i=0}^{x} {S[i] \cdot P[i]}$, which is the same as taking a preffix of $S$ and **(this is the tricky part)** a suffix of $P$, each of length $x+1$↵
↵
If we take $x = m-1$, we can see that this applies the similarity function to the whole pattern $P$ and the first substring of $S$ which starts at $0$. If we take $x>m-1$ then we apply the function to $P$ and the $(x-m+1)-th$ substring of $S$. And if we take $x<m-1$ we apply the function to a substring which goes out of the bounds of $S$.↵
↵
Let's check our example $S = baabab$ and $P = bba$. First, let's map each character $S = 100101$ and $P = 110$.↵
↵
Then,$P_r=011$ and let's add some trailing zeroes: $P_r=011000$.↵
↵
Let's calculate the convolution $f$↵
↵
$f(0) = 0$↵
↵
$f(1) = 1$↵
↵
$f(2) = 1$↵
↵
$f(3) = 0$↵
↵
$f(4) = 1$↵
↵
$f(5) = 1$↵
↵
Great! $f$ computes the number of $1's$ that match. What about the number of $0's$? Let's apply the same procedure to $S^c$ and $P^c$↵
↵
$S^c = 011010$ and $P^c = 001$↵
↵
$P_{r}^{c} = 100000$↵
↵
$f^c(0) = 0$↵
↵
$f^c(1) = 1$↵
↵
$f^c(2) = 1$↵
↵
$f^c(3) = 0$↵
↵
$f^c(4) = 1$↵
↵
$f^c(5) = 0$↵
↵
**Fine! Now, the answer should be** $max$ $f(i)+f^c(i)$ **where** $i$ **is in** $[m-1,n-1]$↵
↵
With this approach, we can solve the problem with a $O(n log n)$ time complexity using an algorithm like FFT to perform the convolution, or maybe Karatsuba algorithm↵
↵
**And what if the alphabet size is not 2?**↵
↵
For each character on the alphabet, we apply the convolution. Suppose $f_{a_i}$ is the convolution between $S$ and $P$ where, if $S[i] == a_i$ then $S[i] = 1$, else $S[i] = 0$ (the same applies for $P$), for every character $a_i$ on the alphabet $a$↵
↵
The answer should be $max$ $\sum_{i=0}^{|a|}{f_{a_i}(j) + f^{c}_{a_i}(j)}$ with $j$ in $[m-1,n-1]$↵
↵
This solution has a $O(|a|\cdot n log n)$ time complexity, where |a| is the size of the alphabet↵
↵
**And what now?**↵
↵
Here are some problems which can be solved using this approach↵
↵
[ADAMATCH](http://www.spoj.com/problems/ADAMATCH/)↵
↵
[MAXMATCH](http://www.spoj.com/problems/MAXMATCH/)↵
↵
[K inversions](https://naipc16.kattis.com/problems/kinversions)↵
↵
Hope you like this post, and helps you solving some convolution problems!↵

Rev.	By	When	Δ	Comment
en6	snacache	2018-05-11 05:08:43	4
en5	snacache	2018-05-11 00:23:31	17	Tiny change: 'f_{a_i}(j) + f^{c}_{a_i}(j)}' -> 'f_{a_i}(j)}'
en4	snacache	2018-05-10 23:54:16	62
en3	snacache	2018-05-10 23:49:52	2787	Tiny change: '$ memory.<\b>\n\nThis' -> '$ memory.</b>\n\nThis' (published)
en2	snacache	2018-05-10 21:17:51	2109	Tiny change: 'y, where $m\leqn$' -> 'y, where $n\beqm$'
en1	snacache	2018-05-10 19:07:47	444	Initial revision (saved to drafts)

History