On reducing the times of Moebius transform for subset convolution

#	User	Rating
1	tourist	3880
2	jiangly	3669
3	ecnerwala	3654
4	Benq	3627
5	orzdevinwang	3612
6	Geothermal	3569
6	cnnfls_csy	3569
8	jqdai0815	3532
9	Radewoosh	3522
10	gyh20	3447

#	User	Contrib.
1	awoo	161
1	maomao90	161
3	adamant	156
4	maroonrk	153
5	-is-this-fft-	148
5	atcoder_official	148
5	SecondThread	148
8	Petr	147
9	nor	144
10	TheScrasse	142

Hi, Codeforces! I want to introduce a simple way to understand the subset convolution and a simple optimization from scratch.

Actually, I do not know this trick is well-known or just out-of-date.

If there are any mistakes, I am happy to fix them (if I am able to) and thanks for pointing out.

We know that subset convolution is equivalent to truncated multivariate polynomial multiplication like

$$$ A(x_1,\dots ,x_n)B(x_1,\dots ,x_n)\bmod (x_1^2,\dots ,x_n^2) $$$

simply written

$$$ A(\mathbf{x})B(\mathbf{x})\bmod{\mathbf{x}^2} $$$

we know that we could compute

$$$ A(\mathbf{x})B(\mathbf{x})\bmod{\left(\mathbf{x}^2-\mathbf{x}\right)} $$$

by using the FFT-like way (Computing $$$A(\mathbf{x})\bmod{(\lbrace x_1,x_1-1\rbrace \times \cdots \times\lbrace x_n,x_n-1\rbrace)}$$$ and using Chinese remainder algorithm to restore the result. This is called the Zeta transform and the Moebius transform respectively). But the coefficients are somehow “mixed”, we can not tell the coefficients of $$$\mathbf{x}^2$$$ and $$$\mathbf{x}$$$. An idea is that we could introduce a new variable $$$t$$$ and computing $$$A(\mathbf{x})B(\mathbf{x})\bmod{\left(\mathbf{x}^2-\mathbf{x}t\right)}$$$, simply let $$$t\gets 0$$$ to get the result of subset convolution. Here is the main idea of the whole algorithm we may familar with:

First, we define $$$s$$$ for each term of $$$A(\mathbf{x}),B(\mathbf{x})$$$:

$$$ s\left(c x_1^{d_1}x_2^{d_2}\cdots x_n^{d_n}\right):=\sum_{k=1}^nd_k $$$

and

$$$ A_k(\mathbf{x})\text{ such that }s(y)=k\text{ for any term }y\text{ of }A_k(\mathbf{x}) $$$

then we compute

$$$ A(\mathbf{x})B(\mathbf{x})\bmod{(\mathbf{x}^2-\mathbf{x}t)}=\left(\sum_i A_i(\mathbf{x})\right)\left(\sum_i B_i(\mathbf{x})\right)\bmod{(\mathbf{x}^2-\mathbf{x}t)} $$$

For the $$$O(n^2)$$$ times of computation of form $$$A_i(\mathbf{x})B_j(\mathbf{x})\bmod{(\mathbf{x}^2-\mathbf{x}t)}$$$, we have an invariant that $$$s(y)+\deg_t(y)=i+j$$$ for any term $$$y$$$ of $$$A_i(\mathbf{x})B_j(\mathbf{x})\bmod{(\mathbf{x}^2-\mathbf{x}t)}$$$. Even we only do the computation of $$$A_i(\mathbf{x})B_j(\mathbf{x})\bmod{(\mathbf{x}^2-\mathbf{x})}$$$, we could restore the result of $$$A_i(\mathbf{x})B_j(\mathbf{x})\bmod{(\mathbf{x}^2-\mathbf{x}t)}$$$, and we do not use these coefficients that with variable $$$t$$$.

Another observation is that $$$\deg_t(A_i(\mathbf{x})B_j(\mathbf{x})\bmod{(\mathbf{x}^2-\mathbf{x}t)})\leq (i+j)/2$$$. For example, if we have $$$\left(\sum_{i+j=4}A_i(\mathbf{x})B_j(\mathbf{x})\right)+\left(\sum_{i+j=9}A_i(\mathbf{x})B_j(\mathbf{x})\right)\bmod{(\mathbf{x}^2-\mathbf{x}t)}$$$, it is still possible for us to tell apart. With this simple trick, we could reduce the times of Moebius transform for about a half.

This is a submission for the optimized code. I did not implement it carefully.

upd: submission reinplemented.