### ftiasch's blog

By ftiasch, history, 18 months ago,

I finally solved problem G of GP of Dolgoprudny (also problem 83 in New Year Contest 2018). Though there are discussions on Codeforces, they all look very abstract to me. Hence, I decided to write down a full editorial.

First, we need some definitions. For given string $s$, we write $\mathrm{occ}_s(p) = {i : s[i - |p| + 1: i] = p}$. We write $x \equiv y$ if $\mathrm{occ}_s(x) = \mathrm{occ}_s(y)$. The "$\equiv$" relation groups all non-empty substrings of $s$ into equivalence classes. We write the equivalence class where $x$ is in as $[x]^R$. We also write $\overleftarrow{x}$ for the unique longest string in $[x]^R$.

The number of states in the automaton of $s$ equals to the number of equivalence classes plus $1$. We will then compute the number of equivalence classes. By the linearity of expectation, we can compute for each string $p$, the number of strings $s$ which contains an equivalence class $[x]^R$ where $\overleftarrow{x} = p$.

For $\overleftarrow{x} = p$ to hold, two conditions should be satisfied:

1. $p$ occurs as a substring in $s$;
2. $[ap]^R \neq [p]^R$ for any character $a$.

We first compute the number of strings $s$ satisfying condition 1 solely. Then we subtract the number of strings $s$ where $[ap]^R = [p]^R$ for all $a$.

## Condition 1

$p$ may end in string $s$ in positions $P = {|p|, |p|+1, \dots, n}$. Using the principle of Inclusion and Exclusion (i.e., PIE), we can alternatively compute the sum

$\sum_{A \neq \emptyset, A \subseteq P} (-1)^{|A|} \mathrm{count}(p, A)$

where $\mathrm{count}(p, A)$ the number of strings $s$ where $p$ ends in positions $A$.

If we know the string $p$ in advance, this can be done by a $O(n^2)$ DP. But how if the number of possible strings $p$ can be large?

Let's introduce another notation $\mathrm{per}(s) = {k : s[1 : |s| - k + 1] = s[k + 1 : |s|]}$ for a string $s$. We notice that the above sum is determined only by the set $\mathrm{per}(p)$, not the string $p$ itself. The number of different $\mathrm{per}(p)$ is around 1 thousand when $|p|=40$. Thus, we can instead enumerate different $\mathrm{per}(p)$ and compute the desired sum.

## Condition 2

For given string $s$ and pattern $ap$, we subtract from the result $\mathrm{occ}_s(ap) \neq \emptyset$ and $\mathrm{occ}_s(ap) = \mathrm{occ}_s(p)$. By PIE, the second condition boils down to the sum

$\sum_{s \in \Sigma^n} \sum_{X \subseteq \mathrm{occ}_s(ap), X \neq \emptyset} \sum_{Y \subseteq \mathrm{occ}_s(p), X \cap Y = \emptyset} (-1)^{|Y|}.$

Noticing that $\mathrm{occ}(ap) \subseteq \mathrm{occ}(p)$, we can sum over $U = X \cup Y$. The sum equals to

$\begin{matrix}{rl} & \sum_{s \in \Sigma^n} \sum_{U \subseteq \mathrm{occ}_s(p)} \sum_{X \subseteq U \cap \mathrm{occ}_s(ap), X \neq \emptyset} (-1)^{|U| — |X|} \\ = & -\sum_{s \in \Sigma^n} \sum_{U \subseteq \mathrm{occ}_s(p)} (-1)^{|U|} [U \cap \mathrm{occ}_s(ap) \neq \emptyset] \\ = & -\sum_{s \in \Sigma^n} \sum_{U \subseteq \mathrm{occ}_s(p)} (-1)^{|U|} (1 — [U \cap \mathrm{occ}_s(ap) = \emptyset]) \\ = & -\sum_{s \in \Sigma^n} \sum_{U \subseteq \mathrm{occ}_s(p)} (-1)^{|U|} + \sum_{s \in \Sigma^n} \sum_{U \subseteq \mathrm{occ}_s(p)} \sum_{V \subseteq U \cap \mathrm{occ}_s(ap)}(-1)^{|U| + |V|} \\ = & -\sum_{U \subseteq [n]} (-1)^{|U|} \sum_{s \in \Sigma^n} [U \subseteq \mathrm{occ}_s(p)] + \sum_{U \subseteq [n]} \sum_{V \subseteq U}(-1)^{|U| + |V|} \sum_{s \in \Sigma^n} [U \subseteq \mathrm{occ}_s(p) \wedge V \subseteq \mathrm{occ}_s(ap)]\\ \end{matrix}$

Both parts can be computed using similar DP mentioned in the previous section. The only thing to notice is that we need not the set $\mathrm{per}(p)$, but the pair $(\mathrm{per}(p), \mathrm{per}(ap))$.

• +76

By ftiasch, 6 years ago,

I find it is unable to create new statement while editing the old one remains available.

Can somebody help?

• +8

By ftiasch, 8 years ago,

http://main.edu.pl/en/archive/pa/2011/naj

problem: find length of shortest period after removing exactly one character from the given string.

the period of string s is the shortest string t which s is a prefix of tk for sufficiently large k, where tk denotes t + t + ... + t (k times).

i have already got some idea of the algorithm, is there any algorithm running in O(N) time?

• +17

By ftiasch, 8 years ago,

The problems are all interesting and HARD!