#	User	Rating
1	ecnerwala	3649
2	Benq	3581
3	orzdevinwang	3570
4	Geothermal	3569
4	cnnfls_csy	3569
6	tourist	3565
7	maroonrk	3531
8	Radewoosh	3521
9	Um_nik	3482
10	jiangly	3468

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	161
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	152
8	SecondThread	147
9	orz	146
10	pajenegod	145

sidhant's blog

Tutorial on Zeta Transform, Mobius Transform and Subset Sum Convolution

By sidhant, 4 years ago, In English

Pre-requisite: Go through the Zeta/SOS DP/Yate's DP blog here

Source: This blog post is an aggregation of the explanation done by arjunarul in this video, a comment by rajat1603 here and the paper on Fast Subset Convolution

Notation:

set $$$s$$$ and mask $$$s$$$ are used interchangeably meaning the same thing.
$$$a \setminus b$$$ would mean set subtraction, i.e subtracting set $$$b$$$ from set $$$a$$$.
$$$|s|$$$ refers to the cardinality, i.e the size of the set $$$s$$$.
$$$\sum_{s' \subseteq s} f(s')$$$ refers to summing function $$$f$$$ over all possible subsets (aka submasks) $$$s'$$$ of $$$s$$$.

Aim: Given functions $$$f$$$ and $$$g$$$ both from $$$[0, 2^n)$$$ to integers. Can be represented as arrays $$$f[]$$$ and $$$g[]$$$ respectively in code. We want to compute the following transformations fast:

Zeta Transform (aka SOS DP/Yate's DP): $$$z(f(s)) = \sum_{s' \subseteq s} f(s')$$$, $$$\forall s \in [0, 2^n)$$$ in $$$O(N*2^N)$$$
Mobius Transform (i.e inclusion exclusion sum over subset): $$$\mu(f(s)) = \sum_{s' \subseteq s} (-1)^{|s \setminus s'|} f(s')$$$, $$$\forall s \in [0, 2^n)$$$ in $$$O(N*2^N)$$$. Here the term $$$(-1)^{|s \setminus s'|}$$$ just looks intimidating but simply means whether we should add the term or subtract the term in Inclusion-Exclusion Logic.
Subset Sum Convolution: $$$f \circ g(s) = \sum_{s' \subseteq s} f(s')g(s \setminus s')$$$, $$$\forall s \in [0, 2^n)$$$ in $$$O(N^2*2^N)$$$. In simpler words, take all possible ways to partition set $$$s$$$ into two disjoint partitions and sum over product of $$$f$$$ on one partition and $$$g$$$ on the complement of that partition.

Firstly, Zeta Transform is explained already here already. Just go through it first and then come back here.

Motivation: As an exercise and motivation for this blog post, you can try to come up with fast algorithms for Mobius transform and Subset Sum convolution yourself. Maybe deriving Mobius transform yourself is possible, but for Subset Sum convolution it seems highly unlikely, on the contrary if you can then you had the potential to publish the research paper mentioned in the source.

Now, let us define a relatively easy transform, i.e

Odd-Negation Transform: $$$\sigma(f(s)) = (-1)^{|s|}*f(s)$$$. This can be computed trivially in $$$O(2^{N})$$$ (assuming __builtin_popcount is $$$O(1)$$$). The name "Odd-Negation" gives the meaning of the transform, i.e if the subset $$$s$$$ is of odd cardinality then this tranform negates it, otherwise it does nothing.

Now we will show 3 important theorems (each with proof, implementation and intuition), which are as follows:

Theorem 1: $$$\sigma z \sigma(f(s)) = \mu(f(s))$$$, $$$\forall s \in [0, 2^n)$$$

Proof: For any given $$$s$$$

$$$ \begin{align} \sigma z \sigma(f(s)) &= (-1)^{|s|} * \sum_{s' \subseteq s} \sigma f(s')\\ &= (-1)^{|s|} * \sum_{s' \subseteq s} (-1)^{|s'|} f(s') \end{align} $$$

Now, notice two cases for parity of $$$|s|$$$

Case 1: $$$|s|$$$ is even. Then $$$(-1)^{|s|} = 1$$$.

And $$$(-1)^{|s'|} = (-1)^{|s \setminus s'|}$$$ because $$$|s'| \bmod 2 = |s \setminus s'| \bmod 2$$$, when $$$|s|$$$ is even.

So,

$$$ \begin{align} \sigma z \sigma(f(s)) &= \sum_{s' \subseteq s} (-1)^{|s \setminus s'|} f(s')\\ &= \mu(f(s)) \end{align} $$$

Case 2: $$$|s|$$$ is odd. Then $$$(-1)^{|s|} = -1$$$.

And $$$(-1)^{|s'|} = -(-1)^{|s \setminus s'|}$$$ because $$$|s'| \bmod 2 \neq |s \setminus s'| \bmod 2$$$ when $$$|s|$$$ is odd.

So,

$$$ \begin{align} \sigma z \sigma(f(s)) &= (-1) * \sum_{s' \subseteq s} -(-1)^{|s \setminus s'|} f(s')\\ &= \sum_{s' \subseteq s} (-1)^{|s \setminus s'|} f(s')\\ &= \mu(f(s)) \end{align} $$$

QED.

Theorem 1 implies that mobius transform can be done by simply applying the 3 transforms one after the other like the following and it will be $$$O(N*2^N)$$$:

Implementation

// Apply odd-negation transform
for(int mask = 0; mask < (1 << N); mask++) {
    if((__builtin_popcount(mask) % 2) == 1) {
        f[mask] *= -1;
    }
}

// Apply zeta transform
for(int i = 0; i < N; i++) {
    for(int mask = 0; mask < (1 << N); mask++) {
        if((mask & (1 << i)) != 0) {
            f[mask] += f[mask ^ (1 << i)];
        }
    }
}

// Apply odd-negation transform
for(int mask = 0; mask < (1 << N); mask++) {
    if((__builtin_popcount(mask) % 2) == 1) {
        f[mask] *= -1;
    }
}
for(int mask = 0; mask < (1 << N); mask++)  mu[mask] = f[mask];

Intuition: The first $$$\sigma$$$ transform, just negates the odd masks. Then we do zeta over it, so each element stores sum of even submasks minus sum of odd submasks. Now if the $$$s$$$ being evaluated is even, then this is correct, otherwise this is inverted, since odds should be the ones being added in and evens being subtracted. Therefore, we applied the $$$\sigma$$$ transform again.

Now a somewhat, not so important theorem:

Theorem 2: $$$z^{-1}(f(s) = \mu(f(s))$$$, $$$\forall s \in [0, 2^n)$$$ i.e Inverse SOS DP/Inverse Zeta transform is equivalent to Mobius transform, i.e Zeta Transform and Mobius Transform are inversers of each other $$$z(\mu(f(s)) = f(s) = \mu(z(f(s))$$$.

The is not immediately obvious. But once someone thinks more about how to do Inverse SOS, i.e given a $$$z(f)$$$, how to obtain $$$f$$$ fast. We realise we need to do an inclusion-exclusion logic on the subsets, i.e a Mobius transform.

We will skip the proof for this, although it can be viewed from here if anyone is interested. (No Proof and Intuition section)

The interesting thing that comes out of this is that for mobius/inverse zeta/inverse SOS we have a shorter implementation which works out as follows:

Implementation

for(int i = 0; i < N; i++) {
    for(int mask = 0; mask < (1 << N); mask++) {
        if((mask & (1 << i)) != 0) {
            f[mask] -= f[mask ^ (1 << i)]
        }
    }
}
for(int mask = 0; mask < (1 << N); mask++)  zinv[mask] = mu[mask] = f[mask]

Here in this implementation, after the $$$i^{th}$$$ iteration, $$$f[s]$$$ will denote $$$\sum_{s' \subseteq F(i, s)} (-1)^{|s \setminus s'|} f(s')$$$, where $$$F(i, s)$$$ denotes the set of all subsets of $$$s$$$, which only differ in the lease significant $$$i$$$ bits from $$$s$$$.

That is, for a given $$$s'$$$, $$$s' \in F(i, s)$$$ IFF $$$s' \subseteq s$$$ AND ($$$s'$$$ & $$$s$$$) >> $$$i$$$ $$$=$$$ $$$s$$$ >> $$$i$$$ (i.e all bits excluding the least significant $$$i$$$ bits, match in $$$s$$$ and $$$s'$$$)

When $$$i = N$$$, we observe $$$F(N, s) = $$$ the set of all subsets of $$$s$$$ and thus we have arrived at Mobius Transform.

Another interesting observation here is that: If we generalise the statement f[mask] (+/-)= f[mask ^ (1 << i)] to f[mask] = operation(f[mask], f[mask ^ (1 << i)]), then if the operation here applied multiple times yields the same thing (Ex. Add, Max, Gcd) then it is equivalent to SOS style combine, i.e

$$$f(s) = \text{operation}_{s' \subseteq s} f(s')$$$

, otherwise, it may NOT behave as SOS style combine (Ex.Subtraction)

Now the next is subset sum convolution.

Before that, let us define the following:

$$$ \hat{f}(i, s) = \begin{cases} f(s),& \text{if } |s| = i\\ 0, & \text{otherwise} \end{cases} $$$ $$$ \hat{g}(i, s) = \begin{cases} g(s),& \text{if } |s| = i\\ 0, & \text{otherwise} \end{cases} $$$

These just means that $$$\hat{f}(k, \dots)$$$ and $$$\hat{g}(k, \dots)$$$ will be concentrating only on those sets/masks which have cardinality size/number of bits on $$$= k$$$.

Theorem 3: $$$f \circ g(s) = z^{-1}(\sum_{i = 0}^{|s|} z(\hat{f}(i, s)) * z(\hat{g}(|s| - i, s)))$$$, $$$\forall s \in [0, 2^n)$$$

Proof:

Let $$$p(k, s)$$$ be defined as follows:

$$$ \begin{align} p(k, s) = \sum_{i = 0}^{k} \sum_{\substack{a \subseteq s \\ |a| = i}} \sum_{\substack{b \subseteq s \\ |b| = k - i \\ a \cup b = s}} f(a)g(b) \end{align} $$$

Then $$$p(|s|, s) = f \circ g(s)$$$

Proof:

$$$ \begin{align} p(|s|, s) &= \sum_{i = 0}^{|s|} \sum_{\substack{a \subseteq s \\ |a| = i}} \sum_{\substack{b \subseteq s \\ |b| = |s| - i \\ a \cup b = s}} f(a)g(b)\\ &= \sum_{i = 0}^{|s|} \sum_{\substack{a \subseteq s \\ |a| = i}} f(a)g(s \setminus a) \text{ [Because only }b = s \setminus a\text{ is a valid entity of the inner summation]} \\ &= \sum_{a \subseteq s} f(a)g(s \setminus a)\\ &= f \circ g(s) \end{align} $$$

Let $$$h(k, s)$$$ denote the following summation, i.e

$$$ \begin{align} h(k, s) &= \sum_{i = 0}^{k} z(\hat{f}(i, s) * z(\hat{g}(k - i, s)\\ &= \sum_{i = 0}^{k} (\sum_{\substack{a \subseteq s \\ |a| = i}} f(a)) * (\sum_{\substack{b \subseteq s \\ |b| = k - i}} g(b))\\ &= \sum_{i = 0}^{k} \sum_{\substack{a \subseteq s \\ |a| = i}} \sum_{\substack{b \subseteq s \\ |b| = k - i}} f(a)g(b)\\ &= \sum_{i = 0}^{k} \sum_{s' \subseteq s} \sum_{\substack{a \subseteq s' \\ |a| = i}} \sum_{\substack{b \subseteq s' \\ |b| = k - i \\ a \cup b = s'}} f(a)g(b)\\ &= \sum_{s' \subseteq s} \sum_{i = 0}^{k} \sum_{\substack{a \subseteq s' \\ |a| = i}} \sum_{\substack{b \subseteq s' \\ |b| = k - i \\ a \cup b = s'}} f(a)g(b)\\ &= \sum_{s' \subseteq s} p(k, s') \end{align} $$$

So, the RHS of Theorem 3 looks like this:

$$$ \begin{align} &= z^{-1}(h(|s|, s))\\ &= z^{-1}(\sum_{s' \subseteq s} p(|s|, s'))\\ &= p(|s|, s)\\ &= f \circ g(s) \end{align} $$$

QED.

Theorem 3 implies that subset sum convolution can be done by applying SOS and Inverse SOS DP $$$N$$$ times, for each cardinality size (Therefore complexity is $$$O(N^2*2^N)$$$), as follows:

Implementation

// Make fhat[][] = {0} and ghat[][] = {0}
for(int mask = 0; mask < (1 << N); mask++) {
    fhat[__builtin_popcount(mask)][mask] = f[mask];
    ghat[__builtin_popcount(mask)][mask] = g[mask];
}

// Apply zeta transform on fhat[][] and ghat[][]
for(int i = 0; i < N; i++) {
    for(int j = 0; j < N; j++) {
        for(int mask = 0; mask < (1 << N); mask++) {
            if((mask & (1 << j)) != 0) {
                fhat[i][mask] += fhat[i][mask ^ (1 << j)];
                ghat[i][mask] += ghat[i][mask ^ (1 << j)];
            }
        }
    }
}

// Do the convolution and store into h[][] = {0}
for(int mask = 0; mask < (1 << N); mask++) {
    for(int i = 0; i < N; i++) {
        for(int j = 0; j <= i; j++) {
            h[i][mask] += fhat[j][mask] * ghat[i - j][mask];
        } 
    }
}

// Apply inverse SOS dp on h[][]
for(int i = 0; i < N; i++) {
    for(int j = 0; j < N; j++) {
        for(int mask = 0; mask < (1 << N); mask++) {
            if((mask & (1 << j)) != 0) {
                h[i][mask] -= h[i][mask ^ (1 << j)];
            }
        }
    }
}
for(int mask = 0; mask < (1 << N); mask++)  fog[mask] = h[__builtin_popcount(mask)][mask];

Intuition: The expression

$$$ \begin{align} &= \sum_{i = 0}^{|s|} z(\hat{f}(i, s)) * z(\hat{g}(|s| - i, s))\\ &= \sum_{i = 0}^{|s|} (\sum_{\substack{a \subseteq s \\ |a| = i}} f(a)) (\sum_{\substack{b \subseteq s \\ |b| = |s| - i}} g(b))\\ \end{align} $$$

stores the sum of where $$$a$$$ is a subset of $$$s$$$, $$$b$$$ is a subset of $$$s$$$ and $$$|a| + |b| = |s|$$$. If we reframe this summation as the union of $$$a$$$ and $$$b$$$ being equal to $$$s'$$$ where $$$s'$$$ is a subset of $$$s$$$. This way, we can restate the summation as

$$$ \begin{align} &= \sum_{s' \subseteq s} \sum_{\substack{a, b \subseteq s' \\ a \cup b = s' \\ |a| + |b| = |s|}} f(a)g(b) \end{align} $$$

If we see this closely, this is Inverse SOSable (can be seen because of the summation on all possible subsets of $$$s$$$). Once we do Inverse SOS, i.e apply $$$z^{-1}$$$, we get

$$$ \begin{align} &= \sum_{\substack{a, b \subseteq s \\ a \cup b = s \\ |a| + |b| = |s|}} f(a)g(b) = f \circ g(s) \end{align} $$$

Problems to Practice:

Problems mentioned in SOS blog, here and for Subset Sum Convolution, I only know of this as of now 914G - Sum the Fibonacci. But the technique seems super nice and I hope new problems do come up in nearby future.

Full text and comments »

dynamic programming, bitmask, sum of subsets, convolution

+168

sidhant
4 years ago
5

Invitation to CodeChef January Cook-Off 2018!

By sidhant, history, 6 years ago, In English

After the recently concluded January Long Challenge, I would like to invite you for some more interesting challenges to keep your coding schedule always packed!

Participate in the January Cook-Off 2018 and enjoy Chef’s tricky tests. Joining me on the panel are:

Problem Setter and Editorialist: sidhant (Sidhant Bansal), arjunarul (Arjun Arul)
Problem Tester and Admin: kingofnumbers (Hasan Jaddouh), mgch (Misha Chorniy)
Russian Translator: CherryTree (Sergey Kulik)
Mandarin Translator: huzecong (Hu Zecong)
Vietnamese Translator: VNOI Team

I hope you will enjoy solving them. Please give your feedback on the problem set in the comments below, after the contest.

Contest Details:

Time: 21st January 2018 (2130 hrs — 0000 hrs Indian Standard Time — +5:30 GMT) — Check your timezone.

Contest link: https://www.codechef.com/COOK90

Registration: You just need to have a CodeChef handle to participate. For all those, who are interested and do not have a CodeChef handle, are requested to register in order to participate.

Prizes:

Top 10 performers in Global and Indian category will get CodeChef laddus, with which the winners can claim cool CodeChef goodies. Know more here: https://www.codechef.com/laddu. (For those who have not yet got their previous winning, please send an email to [email protected])

According to me, the problems are very interesting and I am sure that you will have fun while solving these problems :)

Good Luck! Hope to see you participating!!

Full text and comments »

sidhant
6 years ago
33

Tutorial on FFT/NTT — The tough made simple. ( Part 2 )

By sidhant, history, 7 years ago, In English

Welcome to Part 2

Firstly, I am assuming that you have been through the Part 1 of this blog.

Okay so in hindsight I now see the drawbacks there were in my explanation of the roots of unity and how the divide and conquer works in FFT.

So in this blog I would be aiming to give you a visual intuition of what really FFT exploits over the trivial classical DFT. Also I would be covering up NTT and sharing a nice trick that I got to know about while learning NTT.

Visual Intuition of FFT $\text{[math]}$

The part of converting the polynomial from coefficient form to point value form in $\text{[math]}$ instead of the trivial N² would be elaborated upon in this section as I find the dry mathematical explanation to be rigorous but not intuitive. In the notation below I have referred to n complex n^th roots of unity as powers of w_n¹ = e^{2π·i / n}

Assume that you have a 8 terms (7 degree) polynomial you need to convert from coefficient form to point value form. So firstly you choose the powers of the 8^th complex root of unity as the x_i's that you will be plugging in the polynomial. Initially this might seem random but hopefully by the end of this topic it would be clear to you.

Okay so now we can reduce our polynomial A(x) as A_even(x²) + x·A_odd(x²) where A_even(y) is a 4 terms (3 degree) polynomial in y and A_odd(y) is also a 4 terms (3 degree) polynomial in y.
Now solving the original problem of converting A(x) from coefficient form to point value form for each of the power of the 8^th root of unity is exactly equivalent to solving this new problem of converting A_even(x²) and A_odd(x²) for powers the 8^th root of unity.
So really this doesn't help us much. But the trick is that when we square (because x²) these powers of the 8^th root of unity, they come out to be same pairwise, i.e. the 0^th and 4^th powers of the 8^th root of unity when squared give the same result, similarly, the 1^st and 5^th powers of 8^th root of unity when squared give the same result and so on.
Also note that these results actually turn out to be the powers of the 4^th root of unity.

Okay, so I gave the dry mathematical proof for this in the previous part, but that time I did not have a visual understanding of it. Well now I do so refer to the animations and diagrams below to get the idea but first read through the below subtopic of Rotation in Imaginary Plane.

Rotation in Imaginary Plane $\text{[math]}$

Okay so as a rule of thumb remember that if you have a complex number lets say Z and you multiply it with e^iθ then it basically rotates it by θ angle in anti clockwise direction with respect to the origin.
So lets say you have 3^rd power of 8^th root of unity, i.e. w_n³ = e^{i·2π·(3 / 8)} and you wish to exponentiate it. Then currently the θ is $\text{[math]}$ .
So if you square it you are actually multiplying the θ by 2 so you are basically rotating it by θ degrees in anti clockwise direction with respect to the origin, i.e. it will now be make an angle of $\text{[math]}$ with the positive x axis in the anti clockwise direction.
Raising it by power x is (w_n³)^x = e^{i·2π·(3 / 8)·x} = e^{i·2π·(3 / 8)}·e^{i·2π·(3 / 8)·(x - 1)} which is equivalent to rotating e^{i·2π·(3 / 8)} by $\text{[math]}$ in anti clockwise direction with respect to positive x axis.

squaring

So given this understanding we can now easily exponentiate roots of unity. So here is the key, when you mark the 8th roots of unity on a graph it is regular octagon whose vertices are making angle $\text{[math]}$ degree with respect to positive x axis.
Now when you square each of these, the angles double as shown above, so they become 0 * 2, 45 * 2, 90 * 2, 135 * 2, 180 * 2, 225 * 2, 270 * 2, 315 * 2 = 0, 90, 180, 270, 360, 450, 540, 630 = $\text{[math]}$ which are basically the points of the 4^th root of unity. Below is a beautiful animation showing the same (You would need to open the link below and play the animation from the top left hand side)

Animation here

Now you can clearly grasp the complexity of T(n) = T(n / 2) (For A_even) + T(n / 2) (For A_odd) + O(2·n)(For combining the results as A_even(x²) + x·A_odd(x²)). So $\text{[math]}$

NTT (Number Theoretic Transform) $\text{[math]}$

The objective of NTT is to multiply 2 polynomials such that the coefficient of the resultant polynomials are calculated under a particular modulo. The benefit of NTT is that there are no precision errors as all the calculations are done in integers. A major drawback of NTT is that generally we are only able to do NTT with a prime modulo of the form 2^k·c + 1, where k and c are arbitrary constants. So for doing it for a random mod we would need to use CRT (Chinese Remainder Theorem).

Firstly, n^th roots of unity under a primitive field, i.e. $\text{[math]}$ are defined as $\text{[math]}$ , where P is only considered as prime for simplicity.
Here we are assuming that P = 2^k·c + 1, where c, k are positive integers and P is prime.

Note $\text{[math]}$ All the calculation done below are done under modulo P including the operations in the exponents.
Example $\text{[math]}$ r^P + 5 = r⁵. The modulo has been intentionally skipped sometimes as it makes the notation way too ugly.

So we first find a r such that $\text{[math]}$ goes through all the numbers from 1 to P - 1 when x goes from 1 to P - 1. Note that $\text{[math]}$ is already a fact using Fermat's Little Theorem which states that $\text{[math]}$ if gcd(a, P) = 1
After finding one such r, the (2^k)^th root of unity under the modulo field of P will be r^c.
The powers will be as $\text{[math]}$
Notice that as w_nⁿ = 1 for complex roots similarly here $\text{[math]}$

Lemma 1 $\text{[math]}$ Here $\text{[math]}$ when 1 ≤ x < 2^k. this is because r is itself defined as the (P - 1)^th root of unity, i.e. any positive power of r less than P - 1 will not be equal to $\text{[math]}$ , and as P - 1 = 2^k·c, therefore any power of r where c is multiplied by anything less than 2^k would not give $\text{[math]}$ .

Lemma 2 $\text{[math]}$ Another key observation is that (r^c)^α = (r^c)^β where α ≠ β and both lie between [0, 2^k) can never be true, this observation can be proven by contradiction, assuming that r^c·α = r^c·β under $\text{[math]}$ then $\text{[math]}$ which can be generalised to (r^c)^α + γ = (r^c)^β + γ where γ is an integer. But we already know for a fact that $\text{[math]}$ , so lets put γ = 2^k - α, so now $\text{[math]}$ where β + γ ≠ 2^k because α ≠ β. This is contradictory to the result mentioned in Lemma 1, hence proved.

So, r^c is now the (2^k)^th root of unity under the modulo field of P. Now the only difference between NTT and FFT will be that the n^th root of unity changes from w to r^c, where r is found by hit and trial method.

Trick to handle more prime modulos $\text{[math]}$

Instead of breaking the sub-problems as powers of 2 (into two halves), we can generalise it to powers of any number.
Then we can basically solve NTT for a prime of the form B^k·c + 1, obviously if B = 2, then it gives the best time complexity and it would be slower when we opt for different and bigger B's. But it would still work which is the key.
Okay so how ?
Let's take an example for 3
So we have a polynomial of lets say 3³ = 27 terms, i.e. it is a 26 degree polynomial.
Now A(x) = a₀ + a₁·x¹ + a₂·x² + ... + a₂₆·x²⁶
A(x) = (a₀ + a₃·x³ + ... + a₂₄·x²⁴) + (a₁·x + a₄·x⁴ + ... + a₂₅·x²⁵) + (a₂·x² + a₅·x⁵ + ... + a₂₆·x²⁶) = A_0mod3(x³) + x·A_1mod3(x³) + x²·A_2mod3(x³)

Now the neat trick is that the complex roots not only coincide in one another when squared but also when exponentiated to any other power, in this case when cubed.

Below is an animation showing a regular 9-gon denoting the 9^th roots of unity and when cubed, forming an equilateral triangle denoting the 3^rd roots of unity.

Animation here

So basically $\text{[math]}$
Here it is O(2·3·N) as on each step of conquer for the N terms we need to do 3 multiplications i.e. A_0mod3(x³)·1, A_1mod3(x³)·x, A_2mod3(x³)·x² and 3 additions, i.e. adding all the terms together.

For a general B the time complexity will be $\text{[math]}$

Note $\text{[math]}$ The underlying idea behind using roots of unity in any kind of transform is that the roots of unity under any field (complex numbers or under modulo P) form in themselves a cyclic group over multiplication, i.e if any 2 elements of these roots of unity are operated using multiplication then it is guaranteed that their product will be element of this set as well. Also this set has an identity element. So the property of these roots of unity of coinciding into one another when exponentiated is satisfied because they are forming a cyclic group over multiplication and also because the size of this group is of the form B^k

Problems for Practice $\text{[math]}$

To be added.

Thanks to Baba and polingy for assisting me in understanding the concepts mentioned in this blog and to NibNalin for proof reading the blog.

References and resources used $\text{[math]}$ Introduction to Algorithms (aka CLRS), Better Explained, Apfloat blog on NTT, Desmos Graphing Calculator

Any feedback would be appreciated. Please notify me about any typos or errors in the comments or in PM.

Full text and comments »

fft, ntt, tutorial, polynomials

+116

sidhant
7 years ago
19

Tutorial on FFT/NTT — The tough made simple. ( Part 1 )

By sidhant, history, 8 years ago, In English

Aim — To multiply 2 n-degree polynomials in $\text{[math]}$ instead of the trivial O(n²)

I have poked around a lot of resources to understand FFT (fast fourier transform), but the math behind it would intimidate me and I would never really try to learn it. Finally last week I learned it from some pdfs and CLRS by building up an intuition of what is actually happening in the algorithm. Using this article I intend to clarify the concept to myself and bring all that I read under one article which would be simple to understand and help others struggling with fft.

Let’s get started $\text{[math]}$
$\text{[math]}$
Here A(x) and B(x) are polynomials of degree n - 1. Now we want to retrieve C(x) in $\text{[math]}$

So our methodology would be this $\text{[math]}$

Convert A(x) and B(x) from coefficient form to point value form. (FFT)
Now do the O(n) convolution in point value form to obtain C(x) in point value form, i.e. basically C(x) = A(x) * B(x) in point value form.
Now convert C(x) from point value from to coefficient form (Inverse FFT).

Q) What is point value form ?
Ans) Well, a polynomial A(x) of degree n can be represented in its point value form like this $\text{[math]}$ A(x) = {(x₀, y₀), (x₁, y₁), (x₂, y₂), ..., (x_n - 1, y_n - 1)} , where y_k = A(x_k) and all the x_k are distinct.
So basically the first element of the pair is the value of x for which we computed the function and second value in the pair is the value which is computed i.e A(x_k).
Also the point value form and coefficient form have a mapping i.e. for each point value form there is exactly one coefficient representation, if for k degree polynomial, k + 1 point value forms have been used at least.
Reason is simple, the point value form has n variables i.e, a₀, a₁, ..., a_n - 1 and n equations i.e. y₀ = A(x₀), y₁ = A(x₁), ..., y_n - 1 = A(x_n - 1) so only one solution is there.
Now using matrix multiplication the conversion from coefficient form to point value form for the polynomial $\text{[math]}$ can be shown like this $\text{[math]}$

$\text{[math]}$ $\text{[math]}$ $\text{[math]}$

And the inverse, that is the conversion from point value form to coefficient form for the same polynomial can be shown as this $\text{[math]}$

$\text{[math]}$ $\text{[math]}$ $\text{[math]}$

Now, let's assume A(x) = x² + x + 1 = {(1, 3), (2, 7), (3, 13)} and B(x) = x² - 3 = {(1, - 2), (2, 1), (3, 6)}, where degree of A(x) and B(x) = 2
Now as C(x) = A(x) * B(x) = x⁴ + x³ - 2x² - 3x - 3
C(1) = A(1) * B(1) = 3 * - 2 = - 6, C(2) = A(2) * B(2) = 7 * 1 = 7, C(3) = A(3) * B(3) = 13 * 6 = 78

So C(x) = {(1, - 6), (2, 7), (3, 78)} where degree of C(x) = degree of A(x) + degree of B(x) = 4
But we know that a polynomial of degree n - 1 requires n point value pairs, so 3 pairs of C(x) are not sufficient for determining C(x) uniquely as it is a polynomial of degree 4.
Therefore we need to calculate A(x) and B(x), for 2n point value pairs instead of n point value pairs so that C(x)’s point value form contains 2n pairs which would be sufficient to uniquely determine C(x) which would have a degree of 2(n - 1).

Now if we had performed this algorithm naively it would have gone on like this $\text{[math]}$

Note — This is NOT the actual FFT algorithm but I would say that understanding this would layout framework to the real thing.
Note — This is actually DFT algorithm, ie. Discrete fourier transform.

We construct the point value form of A(x) and B(x) using x₀, x₁, ..., x_2n - 1 which can be made using random distinct integers. So point value form of A(x) = {(x₀, α₀), (x₁, α₁), (x₂, α₂), ..., (x_2n - 1, α_2n - 1)} and B(x) = {(x0, β₀), (x1, β₁), (x2, β₂), ..., (x2n - 1, β_2n - 1)} - (1) Note — The x₀, x₁, ..., x_2n - 1 should be same for A(x) and B(x). This conversion takes O(n²).
As C(x) = A(x) * B(x), then what would have been the point-value form of C(x) ?
If we plug in x₀ to all 3 equations then we see that $\text{[math]}$
C(x₀) = A(x₀) * B(x₀)
C(x₀) = α₀ * β₀
So C(x) in point value form will be C(x) = {(x₀, α₀ * β₀), (x₁, α₁ * β₁), (x₂, α₂ * β₂), ..., (x_2n - 1, α_2n - 1 * β_2n - 1)}
This is the convolution, and it’s time complexity is O(n)
Now converting C(x) back from point value form to coefficient form can be represented by using the equation 2. Here calculating the inverse of the matrix requires LU decomposition or Lagrange’s Formula. I won’t be going into depth on how to do the inverse, as this wont be required in the REAL FFT. But we get to understand that using Lagrange’s Formula we would’ve been able to do this step in O(n²).

Note — Here the algorithm was performed wherein we used x₀, x₁, ..., x_2n - 1 as ordinary real numbers, the FFT on the other hand uses roots of unity instead and we are able to optimize the O(n²) conversions from coefficient to point value form and vice versa to $\text{[math]}$ because of the special mathematical properties of roots of unity which allows us to use the divide and conquer approach. I would recommend to stop here and re-read the article till here until the algorithm is crystal clear as this is the raw concept of FFT.

A math primer on complex numbers and roots of unity would be a must now.

Q) What is a complex number ?
Answer — Quoting Wikipedia, “A complex number is a number that can be expressed in the form a + bi, where a and b are real numbers and i is the imaginary unit, that satisfies the equation i² = - 1. In this expression, a is the real part and b is the imaginary part of the complex number.” The argument of a complex number is equal to the magnitude of the vector from origin (0, 0) to (a, b), therefore arg(z) = a² + b² where z = a + bi.

Q) What are the roots of unity ?
Answer — An nth root of unity, where n is a positive integer (i.e. n = 1, 2, 3, ...), is a number z satisfying the equation zⁿ = 1.
In the image above, n = 2, n = 3, n = 4, from LEFT to RIGHT.
Intuitively, we can see that the nth root of unity lies on the circle of radius 1 unit (as its argument is equal to 1) and they are symmetrically placed ie. they are the vertices of a n — sided regular polygon.

The n complex nth roots of unity can be represented as e^{2πik / n} for k = 0, 1, ..., n - 1
Also $\text{[math]}$ Graphically see the roots of unity in a circle then this is quite intuitive.

If n = 4, then the 4 roots of unity would’ve been e^{2πi * 0 / n}, e^{2πi * 1 / n}, e^{2πi * 2 / n}, e^{2πi * 3 / n} = (e^2πi / n)⁰, (e^2πi / n)¹, (e^{2πi / n /})², (e^2πi / n)³ where n should be substituted by 4.
Now we notice that all the roots are actually power of e^2πi / n. So we can now represent the n complex nth roots of unity by w_n⁰, w_n¹, w_n², ..., w_n^n - 1, where w_n = e^2πi / n

Now let us prove some lemmas before proceeding further $\text{[math]}$

Note — Please try to prove these lemmas yourself before you look up at the solution :)

Lemma 1 — For any integer n ≥ 0, k ≥ 0 and d ≥ 0, w_dn^dk = w_n^k

Proof — w_dn^dk = (e^{2πi / dn})^dk = (e^2πi / n)^k = w_n^k

Lemma 2 — For any even integer n > 0, w_n^n / 2 = w₂ = - 1

Proof — w_n^n / 2 = w_{2 * (n / 2)}^n / 2 = w_d * 2^d * 1 where d = n / 2

w_d * 2^d * 1 = w₂¹ — (Using Lemma 1)

w₂¹ = e^iπ = cos(π) + i * sin(π) = - 1 + 0 = - 1

Lemma 3 — If n > 0 is even, then the squares of the n complex nth roots of unity are the (n/2) complex (n/2)th roots of unity, formally (w_n^k)² = (w_n^{k + n / 2})² = w_n / 2^k

Proof — By using lemma 1 we have (w_n^k)² = w_{2 * (n / 2)}^2k = w_n / 2^k, for any non-negative integer k. Note that if we square all the complex nth roots of unity, then we obtain each (n/2)th root of unity exactly twice since,

$\text{[math]}$ (Proved above)

Also, (w_n^{k + n / 2})² = w_n^2k + n = e^{2πi * k' / n}, where k' = 2k + n

e^{2πi * k' / n} = e^{2πi * (2k + n) / n} = e^{2πi * (2k / n + 1)} = e^{(2πi * 2k / n) + (2πi)} = e^{2πi * 2k / n} * e^2πi = w_n^2k * (cos(2π) + i * sin(2π))

$\text{[math]}$ (Proved above)

Therefore, (w_n^k)² = (w_n^{k + n / 2})² = w_n / 2^k

Lemma 4 — For any integer n ≥ 0, k ≥ 0, w_n^{k + n / 2} = - w_n^k

Proof — w_n^{k + n / 2} = e^{2πi * (k + n / 2) / n} = e^{2πi * (k / n + 1 / 2)} = e^{(2πi * k / n) + (πi)} = e^{2πi * k / n} * e^πi = w_n^k * (cos(π) + i * sin(π)) = w_n^k * ( - 1) = - w_n^k

1. The FFT — Converting from coefficient form to point value form

Note — Let us assume that we have to multiply 2 n — degree polynomials, when n is a power of 2. If n is not a power of 2, then make it a power of 2 by padding the polynomial's higher degree coefficients with zeroes.
Now we will see how is A(x) converted from coefficient form to point value form in $\text{[math]}$ using the special properties of n complex nth roots of unity.

y_k = A(x_k)
$\text{[math]}$

Let us define $\text{[math]}$

A^even(x) = a₀ + a₂ * x + a₄ * x² + ... + a_n - 2 * x^{n / 2 - 1}, A^odd(x) = a₁ + a₃ * x + a₅ * x² + ... + a_n - 1 * x^{n / 2 - 1}

Here, A^even(x) contains all even-indexed coefficients of A(x) and A^odd(x) contains all odd-indexed coefficients of A(x).

It follows that A(x) = A^even(x²) + x * A^odd(x²)

So now the problem of evaluating A(x) at the n complex nth roots of unity, ie. at w_n⁰, w_n¹, ..., w_n^n - 1 reduces to $\text{[math]}$

Evaluating the n/2 degree polynomials A^even(x²) and A^odd(x²). As A(x) requires w_n⁰, w_n¹, ..., w_n^n - 1 as the points on which the function is evaluated.

Therefore A(x²) would’ve required (w_n⁰)², (w_n¹)², ..., (w_n^n - 1)².

Extending this logic to A^even(x²) and A^odd(x²) we can say that the A^even(x²) and A^odd(x²) would require (w_n⁰)², (w_n¹)², ..., (w_n^{n / 2 - 1})² ≡ w_n / 2⁰, w_n / 2¹, ..., w_n / 2^{n / 2 - 1} as the points on which they should be evaluated.

Here we can clearly see that evaluating A^even(x²) and A^odd(x²) at w_n / 2⁰, w_n / 2¹, ..., w_n / 2^{n / 2 - 1} is recursively solving the exact same form as that of the original problem, i.e. evaluating A(x) at w_n⁰, w_n¹, ..., w_n^n - 1. (The division part in the divide and conquer algorithm)
Combining these results using the equation A(x) = A^even(x²) + x * A^odd(x²). (The conquer part in the divide and conquer algorithm).

Now, A(w_n^k) = A^even(w_n^2k) + w_n^k * A^odd(w_n^2k), if k < n / 2, quite straightforward

And if k ≥ n / 2, then A(w_n^k) = A^even(w_n / 2^{k - n / 2}) - w_n^{k - n / 2} * A^odd(w_n / 2^{k - n / 2})

Proof — A(w_n^k) = A^even(w_n^2k) + w_n^k * A^odd(w_n^2k) = A^even(w_n / 2^k) + w_n^k * A^odd(w_n / 2^k) using (w_n^k)² = w_n / 2^k

A(w_n^k) = A^even(w_n / 2^k) - w_n^{k - n / 2} * A^odd(w_n / 2^k) using w_n^{k' + n / 2} = - w_n^k' i.e. (Lemma 4), where k' = k - n / 2.

So the pseudocode (Taken from CLRS) for FFT would be like this $\text{[math]}$

1.RECURSIVE-FFT(a)
2. n = a.length()
3. If n = = 1 then return a //Base Case
4. w_n = e^2πi / n
5. w = 1
6. a^even = (a₀, a₂, ..., a_n - 2)
7. a^odd = (a₁, a₃, ..., a_n - 1)
8. y^even = RECURSIVE - FFT(a^even)
9. y^odd = RECURSIVE - FFT(a^odd)
10. For k = 0 to n / 2 - 1
11.     y_k = y_k^even + w * y_k^odd
12.     y_{k + n / 2} = y_k^even - w * y_k^odd
13.     w * = w_n
14. return y;

2. The Multiplication OR Convolution

This is simply this $\text{[math]}$
1.a = RECURSIVE-FFT(a), b = RECURSIVE-FFT(b) //Doing the fft.
2.For k = 0 to n - 1
3. c(k) = a(k) * b(k) //Doing the convolution in O(n)

3. The Inverse FFT

Now we have to recover c(x) from point value form to coefficient form and we are done. Well, here I am back after like 8 months, sorry for the trouble. So the whole FFT process can be show like the matrix $\text{[math]}$

$\text{[math]}$ $\text{[math]}$ $\text{[math]}$

The square matrix on the left is the Vandermonde Matrix (V_n), where the (k, j) entry of V_n is w_n^kj
Now for finding the inverse we can write the above equation as $\text{[math]}$

$\text{[math]}$ $\text{[math]}$ $\text{[math]}$

Now if we can find V_n^- 1 and figure out the symmetry in it like in case of FFT which enables us to solve it in NlogN then we can pretty much do the inverse FFT like the FFT. Given below are Lemma 5 and Lemma 6, where in Lemma 6 shows what V_n^- 1 is by using Lemma 5 as a result.

Lemma 5 — For n ≥ 1 and nonzero integer k not a multiple of n, $\text{[math]}$ = 0

Proof — $\text{[math]}$ Sum of a G.P of n terms.
$\text{[math]}$

We required that k is not a multiple of n because w_n^k = 1 only when k is a multiple of n, so to ensure that the denominator is not 0 we required this constraint.

Lemma 6 — For j, k = 0, 1, ..., n - 1, the (j, k) entry of V_n^- 1 is w_n^- kj / n

Proof — We show that V_n^- 1 * V_n = I_n, the n * n identity matrix. Consider the (j, j') entry of V_n^- 1 * V_n and let it be denoted by [V_n^- 1 * V_n]_jj'
So now $\text{[math]}$
$\text{[math]}$

Now if j' = j then w_n^{k(j' - j)} = w_n⁰ = 1 so the summation becomes 1, otherwise it is 0 in accordance with Lemma 5 given above. Note here that the constraints fore Lemma 5 are satisfied here as n ≥ 1 and j' - j cannot be a multiple of n as j' ≠ j in this case and the maximum and minimum possible value of j' - j is (n - 1) and - (n - 1) respectively.

So now we have it proven that the (j, k) entry of V_n^- 1 is w_n^- kj / n.

Therefore, $\text{[math]}$
The above equation is similar to the FFT equation $\text{[math]}$

The only differences are that a and y are swapped, we have replaced w_n by w_n^- 1 and divided each element of the result by n
Therefore as rightly said by Adamant that for inverse FFT instead of the roots we use the conjugate of the roots and divide the results by n.

That is it folks. The inverse FFT might seem a bit hazy in terms of its implementation but it is just similar to the actual FFT with those slight changes and I have shown as to how we come up with those slight changes. In near future I would be writing a follow up article covering the implementation and problems related to FFT.

Part 2 is here

References used — Introduction to Algorithms(By CLRS) and Wikipedia

Feedback would be appreciated. Also please notify in the comments about any typos and formatting errors :)

Full text and comments »

fft, math, divide and conquer, polynomials, fast multiplication

+394

sidhant
8 years ago
64

sidhant's blog

Contest Details:

Welcome to Part 2

Visual Intuition of FFT

NTT (Number Theoretic Transform)

Trick to handle more prime modulos

Problems for Practice

Aim — To multiply 2 n-degree polynomials in instead of the trivial O(n2)

Now if we had performed this algorithm naively it would have gone on like this

A math primer on complex numbers and roots of unity would be a must now.

1. The FFT — Converting from coefficient form to point value form

2. The Multiplication OR Convolution

3. The Inverse FFT

Visual Intuition of FFT $\text{[math]}$

NTT (Number Theoretic Transform) $\text{[math]}$

Trick to handle more prime modulos $\text{[math]}$

Problems for Practice $\text{[math]}$

Aim — To multiply 2 n-degree polynomials in $\text{[math]}$ instead of the trivial O(n²)

Now if we had performed this algorithm naively it would have gone on like this $\text{[math]}$