Solution for "Problem G. Automaton" from GP of Dolgoprudny

#	User	Rating
1	tourist	3690
2	jiangly	3647
3	Benq	3581
4	orzdevinwang	3570
5	Geothermal	3569
5	cnnfls_csy	3569
7	Radewoosh	3509
8	ecnerwala	3486
9	jqdai0815	3474
10	gyh20	3447

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	162
4	TheScrasse	160
5	nor	158
6	maroonrk	156
7	-is-this-fft-	152
8	orz	146
9	pajenegod	145
9	SecondThread	145

I finally solved problem G of GP of Dolgoprudny (also problem 83 in New Year Contest 2018). Though there are discussions on Codeforces, they all look very abstract to me. Hence, I decided to write down a full editorial.

First, we need some definitions. For given string $$$s$$$, we write $$$\mathrm{occ}_s(p) = {i : s[i - |p| + 1: i] = p}$$$. We write $$$x \equiv y$$$ if $$$\mathrm{occ}_s(x) = \mathrm{occ}_s(y)$$$. The "$$$\equiv$$$" relation groups all non-empty substrings of $$$s$$$ into equivalence classes. We write the equivalence class where $$$x$$$ is in as $$$[x]^R$$$. We also write $$$\overleftarrow{x}$$$ for the unique longest string in $$$[x]^R$$$.

The number of states in the automaton of $$$s$$$ equals to the number of equivalence classes plus $$$1$$$. We will then compute the number of equivalence classes. By the linearity of expectation, we can compute for each string $$$p$$$, the number of strings $$$s$$$ which contains an equivalence class $$$[x]^R$$$ where $$$\overleftarrow{x} = p$$$.

For $$$\overleftarrow{x} = p$$$ to hold, two conditions should be satisfied:

$$$p$$$ occurs as a substring in $$$s$$$;
$$$[ap]^R \neq [p]^R$$$ for any character $$$a$$$.

We first compute the number of strings $$$s$$$ satisfying condition 1 solely. Then we subtract the number of strings $$$s$$$ where $$$[ap]^R = [p]^R$$$ for all $$$a$$$.

Condition 1

$$$p$$$ may end in string $$$s$$$ in positions $$$P = {|p|, |p|+1, \dots, n}$$$. Using the principle of Inclusion and Exclusion (i.e., PIE), we can alternatively compute the sum

$$$\sum_{A \neq \emptyset, A \subseteq P} (-1)^{|A|} \mathrm{count}(p, A)$$$

where $$$\mathrm{count}(p, A)$$$ the number of strings $$$s$$$ where $$$p$$$ ends in positions $$$A$$$.

If we know the string $$$p$$$ in advance, this can be done by a $$$O(n^2)$$$ DP. But how if the number of possible strings $$$p$$$ can be large?

Let's introduce another notation $$$\mathrm{per}(s) = {k : s[1 : |s| - k + 1] = s[k + 1 : |s|]}$$$ for a string $$$s$$$. We notice that the above sum is determined only by the set $$$\mathrm{per}(p)$$$, not the string $$$p$$$ itself. The number of different $$$\mathrm{per}(p)$$$ is around 1 thousand when $$$|p|=40$$$. Thus, we can instead enumerate different $$$\mathrm{per}(p)$$$ and compute the desired sum.

Condition 2

For given string $$$s$$$ and pattern $$$ap$$$, we subtract from the result $$$\mathrm{occ}_s(ap) \neq \emptyset$$$ and $$$\mathrm{occ}_s(ap) = \mathrm{occ}_s(p)$$$. By PIE, the second condition boils down to the sum

$$$\sum_{s \in \Sigma^n} \sum_{X \subseteq \mathrm{occ}_s(ap), X \neq \emptyset} \sum_{Y \subseteq \mathrm{occ}_s(p), X \cap Y = \emptyset} (-1)^{|Y|}.$$$

Noticing that $$$\mathrm{occ}(ap) \subseteq \mathrm{occ}(p)$$$, we can sum over $$$U = X \cup Y$$$. The sum equals to

$$$ \begin{matrix}{rl} & \sum_{s \in \Sigma^n} \sum_{U \subseteq \mathrm{occ}_s(p)} \sum_{X \subseteq U \cap \mathrm{occ}_s(ap), X \neq \emptyset} (-1)^{|U| — |X|} \\ = & -\sum_{s \in \Sigma^n} \sum_{U \subseteq \mathrm{occ}_s(p)} (-1)^{|U|} [U \cap \mathrm{occ}_s(ap) \neq \emptyset] \\ = & -\sum_{s \in \Sigma^n} \sum_{U \subseteq \mathrm{occ}_s(p)} (-1)^{|U|} (1 — [U \cap \mathrm{occ}_s(ap) = \emptyset]) \\ = & -\sum_{s \in \Sigma^n} \sum_{U \subseteq \mathrm{occ}_s(p)} (-1)^{|U|} + \sum_{s \in \Sigma^n} \sum_{U \subseteq \mathrm{occ}_s(p)} \sum_{V \subseteq U \cap \mathrm{occ}_s(ap)}(-1)^{|U| + |V|} \\ = & -\sum_{U \subseteq [n]} (-1)^{|U|} \sum_{s \in \Sigma^n} [U \subseteq \mathrm{occ}_s(p)] + \sum_{U \subseteq [n]} \sum_{V \subseteq U}(-1)^{|U| + |V|} \sum_{s \in \Sigma^n} [U \subseteq \mathrm{occ}_s(p) \wedge V \subseteq \mathrm{occ}_s(ap)]\\ \end{matrix} $$$

Both parts can be computed using similar DP mentioned in the previous section. The only thing to notice is that we need not the set $$$\mathrm{per}(p)$$$, but the pair $$$(\mathrm{per}(p), \mathrm{per}(ap))$$$.

ftiasch's blog

Condition 1

Condition 2