Interesting situation about a counting problem: Solved the problem, I don't know how

#	User	Rating
1	ecnerwala	3648
2	Benq	3580
3	orzdevinwang	3570
4	cnnfls_csy	3569
5	Geothermal	3568
6	tourist	3565
7	maroonrk	3530
8	Radewoosh	3520
9	Um_nik	3481
10	jiangly	3467

#	User	Contrib.
1	maomao90	174
2	adamant	164
2	awoo	164
4	TheScrasse	160
5	nor	159
6	maroonrk	156
7	-is-this-fft-	150
8	SecondThread	147
9	orz	146
10	pajenegod	145

The first stage of the National Olympiad in Informatics in Romania is taking place this Saturday, so this week there were hosted some practice contests on the Romanian OJ kilonova.

I participated in one of them and one of the problems basically sounds like this: "You are given 3 numbers $$$A,B,C(A,B,C \le 3000)$$$, count the number of strings that have $$$A$$$ letters a in them, $$$B$$$ letters b and $$$C$$$ letters c and also don't have abc as a substring".

As I have some skill issue when it comes to combinatorics and dynamic programming, I did not manage to solve the problem for a full score but, my interest for this problem came back to life while talking to Matteo.Verz, who told me that he managed to solve the problem (we both tested and his solution gets a full score) in linear time. I was quite surprised because the constraints allowed solutions with a quadratic time complexity and all of the solutions that I heard about were also quadratic. After explaining me his thought process and showing his code, I realized that his solution uses the same idea that I actually tried in the contest, but with a very small change.

Now here comes the problem: neither me nor Matteo.Verz nor tibinyte nor anyone that I tried to discuss this problem with understands why this solution works, so we decided to ask the amazing community of codeforces hoping to get an answer. There are 2 almost identical solutions for this: here and here.

Let's consider the first solution. There $$$dp_i$$$ is the number of strings that contain exactly $$$i$$$ abc substrings. Now here come some questions:

If this is the case, why don't we need to subtract all values of $$$dp_{i + 1}$$$, $$$dp_{i + 2}$$$ and so on from $$$dp_i$$$ to compute it correctly?

The formula used for initializing $$$dp_i$$$ is quite strange as it counts some strings multiple times, like when having $$$1$$$ abc group and $$$1$$$ unused character from each a, b and c, it will count the string abcabc twice, but the solution is still giving the correct result.

Comments (1)

Write comment?

RedstoneGamer22

13 months ago, # |

+65

Let's write

$$$\displaystyle f(x) = \frac{(A + B + C - 2x)!}{(A - x)!(B - x)!(C - x)!x!}$$$

a function which counts the number of strings with $$$A - x$$$ a's, $$$B - x$$$ b's, $$$C - x$$$ c's and $$$x$$$ special characters "?".

Now, if we replace each apparition of "?" with "abc", this counts the strings with at least $$$x$$$ substrings equal to "abc", with one caveat: it counts strings with $$$y \geq x$$$ multiple times. How many times? Well, if a string has $$$y$$$ "abc"s and we choose to replace $$$x$$$ of them with "?", then it counts that string $$$y \choose x$$$ times.

Thus, if the actual count of strings with exactly $$$x$$$ substrings equal to "abc" is $$$s(x)$$$, and for any $$$x > min(A,B,C)$$$ by definition $$$s(x) = 0$$$, we have

$$$\displaystyle f(x) = \sum_{y=x}^{\infty} {y \choose x} * s(x)$$$

Now, the answer is

$$$\displaystyle S = \sum_{i = 0}^{min(A, B, C)} (-1)^i f(i)$$$

The factor for $$$s(x)$$$ in $$$S$$$, $$$0 \leq x \leq min(A,B,C)$$$ is

$$$\displaystyle C_x = \sum_{y=0}^{x} (-1)^y {x \choose y}$$$

For any $$$x \geq 1$$$, it is known that $$$C_x = 0$$$. Finally,

$$$S = s(0) + \sum_{x=1}^{min(A,B,C)} C_x * s(x) = s(0)$$$

→ Reply

AlexLorintz's blog