### sidhant's blog

By sidhant, 17 months ago,

Pre-requisite: Go through the Zeta/SOS DP/Yate's DP blog here

Source: This blog post is an aggregation of the explanation done by arjunarul in this video, a comment by rajat1603 here and the paper on Fast Subset Convolution

Notation:

1. set $s$ and mask $s$ are used interchangeably meaning the same thing.

2. $a \setminus b$ would mean set subtraction, i.e subtracting set $b$ from set $a$.

3. $|s|$ refers to the cardinality, i.e the size of the set $s$.

4. $\sum_{s' \subseteq s} f(s')$ refers to summing function $f$ over all possible subsets (aka submasks) $s'$ of $s$.

Aim: Given functions $f$ and $g$ both from $[0, 2^n)$ to integers. Can be represented as arrays $f[]$ and $g[]$ respectively in code. We want to compute the following transformations fast:

1. Zeta Transform (aka SOS DP/Yate's DP): $z(f(s)) = \sum_{s' \subseteq s} f(s')$, $\forall s \in [0, 2^n)$ in $O(N*2^N)$

2. Mobius Transform (i.e inclusion exclusion sum over subset): $\mu(f(s)) = \sum_{s' \subseteq s} (-1)^{|s \setminus s'|} f(s')$, $\forall s \in [0, 2^n)$ in $O(N*2^N)$. Here the term $(-1)^{|s \setminus s'|}$ just looks intimidating but simply means whether we should add the term or subtract the term in Inclusion-Exclusion Logic.

3. Subset Sum Convolution: $f \circ g(s) = \sum_{s' \subseteq s} f(s')g(s \setminus s')$, $\forall s \in [0, 2^n)$ in $O(N^2*2^N)$. In simpler words, take all possible ways to partition set $s$ into two disjoint partitions and sum over product of $f$ on one partition and $g$ on the complement of that partition.

Firstly, Zeta Transform is explained already here already. Just go through it first and then come back here.

Motivation: As an exercise and motivation for this blog post, you can try to come up with fast algorithms for Mobius transform and Subset Sum convolution yourself. Maybe deriving Mobius transform yourself is possible, but for Subset Sum convolution it seems highly unlikely, on the contrary if you can then you had the potential to publish the research paper mentioned in the source.

Now, let us define a relatively easy transform, i.e

Odd-Negation Transform: $\sigma(f(s)) = (-1)^{|s|}*f(s)$. This can be computed trivially in $O(2^{N})$ (assuming __builtin_popcount is $O(1)$). The name "Odd-Negation" gives the meaning of the transform, i.e if the subset $s$ is of odd cardinality then this tranform negates it, otherwise it does nothing.

Now we will show 3 important theorems (each with proof, implementation and intuition), which are as follows:

Theorem 1: $\sigma z \sigma(f(s)) = \mu(f(s))$, $\forall s \in [0, 2^n)$

Proof: For any given $s$

\begin{align} \sigma z \sigma(f(s)) &= (-1)^{|s|} * \sum_{s' \subseteq s} \sigma f(s')\\ &= (-1)^{|s|} * \sum_{s' \subseteq s} (-1)^{|s'|} f(s') \end{align}

Now, notice two cases for parity of $|s|$

Case 1: $|s|$ is even. Then $(-1)^{|s|} = 1$.

And $(-1)^{|s'|} = (-1)^{|s \setminus s'|}$ because $|s'| \bmod 2 = |s \setminus s'| \bmod 2$, when $|s|$ is even.

So,

\begin{align} \sigma z \sigma(f(s)) &= \sum_{s' \subseteq s} (-1)^{|s \setminus s'|} f(s')\\ &= \mu(f(s)) \end{align}

Case 2: $|s|$ is odd. Then $(-1)^{|s|} = -1$.

And $(-1)^{|s'|} = -(-1)^{|s \setminus s'|}$ because $|s'| \bmod 2 \neq |s \setminus s'| \bmod 2$ when $|s|$ is odd.

So,

\begin{align} \sigma z \sigma(f(s)) &= (-1) * \sum_{s' \subseteq s} -(-1)^{|s \setminus s'|} f(s')\\ &= \sum_{s' \subseteq s} (-1)^{|s \setminus s'|} f(s')\\ &= \mu(f(s)) \end{align}

QED.

Theorem 1 implies that mobius transform can be done by simply applying the 3 transforms one after the other like the following and it will be $O(N*2^N)$:

Implementation

// Apply odd-negation transform
if((__builtin_popcount(mask) % 2) == 1) {
}
}

// Apply zeta transform
for(int i = 0; i < N; i++) {
if((mask & (1 << i)) != 0) {
}
}
}

// Apply odd-negation transform
if((__builtin_popcount(mask) % 2) == 1) {
}
}


Intuition: The first $\sigma$ transform, just negates the odd masks. Then we do zeta over it, so each element stores sum of even submasks minus sum of odd submasks. Now if the $s$ being evaluated is even, then this is correct, otherwise this is inverted, since odds should be the ones being added in and evens being subtracted. Therefore, we applied the $\sigma$ transform again.

Now a somewhat, not so important theorem:

Theorem 2: $z^{-1}(f(s) = \mu(f(s))$, $\forall s \in [0, 2^n)$ i.e Inverse SOS DP/Inverse Zeta transform is equivalent to Mobius transform, i.e Zeta Transform and Mobius Transform are inversers of each other $z(\mu(f(s)) = f(s) = \mu(z(f(s))$.

The is not immediately obvious. But once someone thinks more about how to do Inverse SOS, i.e given a $z(f)$, how to obtain $f$ fast. We realise we need to do an inclusion-exclusion logic on the subsets, i.e a Mobius transform.

We will skip the proof for this, although it can be viewed from here if anyone is interested. (No Proof and Intuition section)

The interesting thing that comes out of this is that for mobius/inverse zeta/inverse SOS we have a shorter implementation which works out as follows:

Implementation

for(int i = 0; i < N; i++) {
if((mask & (1 << i)) != 0) {
}
}
}


Here in this implementation, after the $i^{th}$ iteration, $f[s]$ will denote $\sum_{s' \subseteq F(i, s)} (-1)^{|s \setminus s'|} f(s')$, where $F(i, s)$ denotes the set of all subsets of $s$, which only differ in the lease significant $i$ bits from $s$.

That is, for a given $s'$, $s' \in F(i, s)$ IFF $s' \subseteq s$ AND ($s'$ & $s$) >> $i$ $=$ $s$ >> $i$ (i.e all bits excluding the least significant $i$ bits, match in $s$ and $s'$)

When $i = N$, we observe $F(N, s) =$ the set of all subsets of $s$ and thus we have arrived at Mobius Transform.

Another interesting observation here is that: If we generalise the statement f[mask] (+/-)= f[mask ^ (1 << i)] to f[mask] = operation(f[mask], f[mask ^ (1 << i)]), then if the operation here applied multiple times yields the same thing (Ex. Add, Max, Gcd) then it is equivalent to SOS style combine, i.e

$f(s) = \text{operation}_{s' \subseteq s} f(s')$

, otherwise, it may NOT behave as SOS style combine (Ex.Subtraction)

Now the next is subset sum convolution.

Before that, let us define the following:

$\hat{f}(i, s) = \begin{cases} f(s),& \text{if } |s| = i\\ 0, & \text{otherwise} \end{cases}$
$\hat{g}(i, s) = \begin{cases} g(s),& \text{if } |s| = i\\ 0, & \text{otherwise} \end{cases}$

These just means that $\hat{f}(k, \dots)$ and $\hat{g}(k, \dots)$ will be concentrating only on those sets/masks which have cardinality size/number of bits on $= k$.

Theorem 3: $f \circ g(s) = z^{-1}(\sum_{i = 0}^{|s|} z(\hat{f}(i, s)) * z(\hat{g}(|s| - i, s)))$, $\forall s \in [0, 2^n)$

Proof:

Let $p(k, s)$ be defined as follows:

\begin{align} p(k, s) = \sum_{i = 0}^{k} \sum_{\substack{a \subseteq s \\ |a| = i}} \sum_{\substack{b \subseteq s \\ |b| = k - i \\ a \cup b = s}} f(a)g(b) \end{align}

Then $p(|s|, s) = f \circ g(s)$

Proof:

\begin{align} p(|s|, s) &= \sum_{i = 0}^{|s|} \sum_{\substack{a \subseteq s \\ |a| = i}} \sum_{\substack{b \subseteq s \\ |b| = |s| - i \\ a \cup b = s}} f(a)g(b)\\ &= \sum_{i = 0}^{|s|} \sum_{\substack{a \subseteq s \\ |a| = i}} f(a)g(s \setminus a) \text{ [Because only }b = s \setminus a\text{ is a valid entity of the inner summation]} \\ &= \sum_{a \subseteq s} f(a)g(s \setminus a)\\ &= f \circ g(s) \end{align}

Let $h(k, s)$ denote the following summation, i.e

\begin{align} h(k, s) &= \sum_{i = 0}^{k} z(\hat{f}(i, s) * z(\hat{g}(k - i, s)\\ &= \sum_{i = 0}^{k} (\sum_{\substack{a \subseteq s \\ |a| = i}} f(a)) * (\sum_{\substack{b \subseteq s \\ |b| = k - i}} g(b))\\ &= \sum_{i = 0}^{k} \sum_{\substack{a \subseteq s \\ |a| = i}} \sum_{\substack{b \subseteq s \\ |b| = k - i}} f(a)g(b)\\ &= \sum_{i = 0}^{k} \sum_{s' \subseteq s} \sum_{\substack{a \subseteq s' \\ |a| = i}} \sum_{\substack{b \subseteq s' \\ |b| = k - i \\ a \cup b = s'}} f(a)g(b)\\ &= \sum_{s' \subseteq s} \sum_{i = 0}^{k} \sum_{\substack{a \subseteq s' \\ |a| = i}} \sum_{\substack{b \subseteq s' \\ |b| = k - i \\ a \cup b = s'}} f(a)g(b)\\ &= \sum_{s' \subseteq s} p(k, s') \end{align}

So, the RHS of Theorem 3 looks like this:

\begin{align} &= z^{-1}(h(|s|, s))\\ &= z^{-1}(\sum_{s' \subseteq s} p(|s|, s'))\\ &= p(|s|, s)\\ &= f \circ g(s) \end{align}

QED.

Theorem 3 implies that subset sum convolution can be done by applying SOS and Inverse SOS DP $N$ times, for each cardinality size (Therefore complexity is $O(N^2*2^N)$), as follows:

Implementation

// Make fhat[][] = {0} and ghat[][] = {0}
}

// Apply zeta transform on fhat[][] and ghat[][]
for(int i = 0; i < N; i++) {
for(int j = 0; j < N; j++) {
if((mask & (1 << j)) != 0) {
}
}
}
}

// Do the convolution and store into h[][] = {0}
for(int i = 0; i < N; i++) {
for(int j = 0; j <= i; j++) {
}
}
}

// Apply inverse SOS dp on h[][]
for(int i = 0; i < N; i++) {
for(int j = 0; j < N; j++) {
if((mask & (1 << j)) != 0) {
}
}
}
}


Intuition: The expression

\begin{align} &= \sum_{i = 0}^{|s|} z(\hat{f}(i, s)) * z(\hat{g}(|s| - i, s))\\ &= \sum_{i = 0}^{|s|} (\sum_{\substack{a \subseteq s \\ |a| = i}} f(a)) (\sum_{\substack{b \subseteq s \\ |b| = |s| - i}} g(b))\\ \end{align}

stores the sum of where $a$ is a subset of $s$, $b$ is a subset of $s$ and $|a| + |b| = |s|$. If we reframe this summation as the union of $a$ and $b$ being equal to $s'$ where $s'$ is a subset of $s$. This way, we can restate the summation as

\begin{align} &= \sum_{s' \subseteq s} \sum_{\substack{a, b \subseteq s' \\ a \cup b = s' \\ |a| + |b| = |s|}} f(a)g(b) \end{align}

If we see this closely, this is Inverse SOSable (can be seen because of the summation on all possible subsets of $s$). Once we do Inverse SOS, i.e apply $z^{-1}$, we get

\begin{align} &= \sum_{\substack{a, b \subseteq s \\ a \cup b = s \\ |a| + |b| = |s|}} f(a)g(b) = f \circ g(s) \end{align}

Problems to Practice:

Problems mentioned in SOS blog, here and for Subset Sum Convolution, I only know of this as of now 914G - Sum the Fibonacci. But the technique seems super nice and I hope new problems do come up in nearby future.

• +168

By sidhant, history, 3 years ago,

After the recently concluded January Long Challenge, I would like to invite you for some more interesting challenges to keep your coding schedule always packed!

Participate in the January Cook-Off 2018 and enjoy Chef’s tricky tests. Joining me on the panel are:

• Problem Setter and Editorialist: sidhant (Sidhant Bansal), arjunarul (Arjun Arul)

• Russian Translator: CherryTree (Sergey Kulik)

• Mandarin Translator: huzecong (Hu Zecong)

• Vietnamese Translator: VNOI Team

I hope you will enjoy solving them. Please give your feedback on the problem set in the comments below, after the contest.

### Contest Details:

Time: 21st January 2018 (2130 hrs — 0000 hrs Indian Standard Time — +5:30 GMT) — Check your timezone.

Registration: You just need to have a CodeChef handle to participate. For all those, who are interested and do not have a CodeChef handle, are requested to register in order to participate.

Prizes:

• Top 10 performers in Global and Indian category will get CodeChef laddus, with which the winners can claim cool CodeChef goodies. Know more here: https://www.codechef.com/laddu. (For those who have not yet got their previous winning, please send an email to winners@codechef.com)

According to me, the problems are very interesting and I am sure that you will have fun while solving these problems :)

Good Luck! Hope to see you participating!!

• +48

By sidhant, history, 4 years ago,

# Welcome to Part 2

Firstly, I am assuming that you have been through the Part 1 of this blog.

Okay so in hindsight I now see the drawbacks there were in my explanation of the roots of unity and how the divide and conquer works in FFT.

So in this blog I would be aiming to give you a visual intuition of what really FFT exploits over the trivial classical DFT. Also I would be covering up NTT and sharing a nice trick that I got to know about while learning NTT.

## Visual Intuition of FFT

The part of converting the polynomial from coefficient form to point value form in instead of the trivial N2 would be elaborated upon in this section as I find the dry mathematical explanation to be rigorous but not intuitive. In the notation below I have referred to n complex nth roots of unity as powers of wn1 = e2π·i / n

Assume that you have a 8 terms (7 degree) polynomial you need to convert from coefficient form to point value form. So firstly you choose the powers of the 8th complex root of unity as the xi's that you will be plugging in the polynomial. Initially this might seem random but hopefully by the end of this topic it would be clear to you.

Okay so now we can reduce our polynomial A(x) as Aeven(x2) + x·Aodd(x2) where Aeven(y) is a 4 terms (3 degree) polynomial in y and Aodd(y) is also a 4 terms (3 degree) polynomial in y.
Now solving the original problem of converting A(x) from coefficient form to point value form for each of the power of the 8th root of unity is exactly equivalent to solving this new problem of converting Aeven(x2) and Aodd(x2) for powers the 8th root of unity.
So really this doesn't help us much. But the trick is that when we square (because x2) these powers of the 8th root of unity, they come out to be same pairwise, i.e. the 0th and 4th powers of the 8th root of unity when squared give the same result, similarly, the 1st and 5th powers of 8th root of unity when squared give the same result and so on.
Also note that these results actually turn out to be the powers of the 4th root of unity.

Okay, so I gave the dry mathematical proof for this in the previous part, but that time I did not have a visual understanding of it. Well now I do so refer to the animations and diagrams below to get the idea but first read through the below subtopic of Rotation in Imaginary Plane.

Rotation in Imaginary Plane

Okay so as a rule of thumb remember that if you have a complex number lets say Z and you multiply it with eiθ then it basically rotates it by θ angle in anti clockwise direction with respect to the origin.
So lets say you have 3rd power of 8th root of unity, i.e. wn3 = ei·2π·(3 / 8) and you wish to exponentiate it. Then currently the θ is .
So if you square it you are actually multiplying the θ by 2 so you are basically rotating it by θ degrees in anti clockwise direction with respect to the origin, i.e. it will now be make an angle of with the positive x axis in the anti clockwise direction.
Raising it by power x is (wn3)x = ei·2π·(3 / 8)·x = ei·2π·(3 / 8)·ei·2π·(3 / 8)·(x - 1) which is equivalent to rotating ei·2π·(3 / 8) by in anti clockwise direction with respect to positive x axis.

So given this understanding we can now easily exponentiate roots of unity. So here is the key, when you mark the 8th roots of unity on a graph it is regular octagon whose vertices are making angle degree with respect to positive x axis.
Now when you square each of these, the angles double as shown above, so they become 0 * 2, 45 * 2, 90 * 2, 135 * 2, 180 * 2, 225 * 2, 270 * 2, 315 * 2 = 0, 90, 180, 270, 360, 450, 540, 630 = which are basically the points of the 4th root of unity. Below is a beautiful animation showing the same (You would need to open the link below and play the animation from the top left hand side)

Animation here

Now you can clearly grasp the complexity of T(n) = T(n / 2) (For Aeven) + T(n / 2) (For Aodd) + O(2·n)(For combining the results as Aeven(x2) + x·Aodd(x2)). So

## NTT (Number Theoretic Transform)

The objective of NTT is to multiply 2 polynomials such that the coefficient of the resultant polynomials are calculated under a particular modulo. The benefit of NTT is that there are no precision errors as all the calculations are done in integers. A major drawback of NTT is that generally we are only able to do NTT with a prime modulo of the form 2k·c + 1, where k and c are arbitrary constants. So for doing it for a random mod we would need to use CRT (Chinese Remainder Theorem).

Firstly, nth roots of unity under a primitive field, i.e. are defined as , where P is only considered as prime for simplicity.
Here we are assuming that P = 2k·c + 1, where c, k are positive integers and P is prime.

Note All the calculation done below are done under modulo P including the operations in the exponents.
Example rP + 5 = r5. The modulo has been intentionally skipped sometimes as it makes the notation way too ugly.

So we first find a r such that goes through all the numbers from 1 to P - 1 when x goes from 1 to P - 1. Note that is already a fact using Fermat's Little Theorem which states that if gcd(a, P) = 1
After finding one such r, the (2k)th root of unity under the modulo field of P will be rc.
The powers will be as
Notice that as wnn = 1 for complex roots similarly here

Lemma 1 Here when 1 ≤ x < 2k. this is because r is itself defined as the (P - 1)th root of unity, i.e. any positive power of r less than P - 1 will not be equal to , and as P - 1 = 2k·c, therefore any power of r where c is multiplied by anything less than 2k would not give .

Lemma 2 Another key observation is that (rc)α = (rc)β where α ≠ β and both lie between [0, 2k) can never be true, this observation can be proven by contradiction, assuming that rc·α = rc·β under then which can be generalised to (rc)α + γ = (rc)β + γ where γ is an integer. But we already know for a fact that , so lets put γ = 2k - α, so now where β + γ ≠ 2k because α ≠ β. This is contradictory to the result mentioned in Lemma 1, hence proved.

So, rc is now the (2k)th root of unity under the modulo field of P. Now the only difference between NTT and FFT will be that the nth root of unity changes from w to rc, where r is found by hit and trial method.

## Trick to handle more prime modulos

Instead of breaking the sub-problems as powers of 2 (into two halves), we can generalise it to powers of any number.
Then we can basically solve NTT for a prime of the form Bk·c + 1, obviously if B = 2, then it gives the best time complexity and it would be slower when we opt for different and bigger B's. But it would still work which is the key.
Okay so how ?
Let's take an example for 3
So we have a polynomial of lets say 33 = 27 terms, i.e. it is a 26 degree polynomial.
Now A(x) = a0 + a1·x1 + a2·x2 + ... + a26·x26
A(x) = (a0 + a3·x3 + ... + a24·x24) + (a1·x + a4·x4 + ... + a25·x25) + (a2·x2 + a5·x5 + ... + a26·x26) = A0mod3(x3) + x·A1mod3(x3) + x2·A2mod3(x3)

Now the neat trick is that the complex roots not only coincide in one another when squared but also when exponentiated to any other power, in this case when cubed.

Below is an animation showing a regular 9-gon denoting the 9th roots of unity and when cubed, forming an equilateral triangle denoting the 3rd roots of unity.

Animation here

So basically
Here it is O(2·3·N) as on each step of conquer for the N terms we need to do 3 multiplications i.e. A0mod3(x3)·1, A1mod3(x3x, A2mod3(x3x2 and 3 additions, i.e. adding all the terms together.

For a general B the time complexity will be

Note The underlying idea behind using roots of unity in any kind of transform is that the roots of unity under any field (complex numbers or under modulo P) form in themselves a cyclic group over multiplication, i.e if any 2 elements of these roots of unity are operated using multiplication then it is guaranteed that their product will be element of this set as well. Also this set has an identity element. So the property of these roots of unity of coinciding into one another when exponentiated is satisfied because they are forming a cyclic group over multiplication and also because the size of this group is of the form Bk

## Problems for Practice

Thanks to Baba and polingy for assisting me in understanding the concepts mentioned in this blog and to NibNalin for proof reading the blog.

References and resources used Introduction to Algorithms (aka CLRS), Better Explained, Apfloat blog on NTT, Desmos Graphing Calculator

Any feedback would be appreciated. Please notify me about any typos or errors in the comments or in PM.

• +116

By sidhant, history, 5 years ago,

#### Aim — To multiply 2n-degree polynomials in instead of the trivial O(n2)

I have poked around a lot of resources to understand FFT (fast fourier transform), but the math behind it would intimidate me and I would never really try to learn it. Finally last week I learned it from some pdfs and CLRS by building up an intuition of what is actually happening in the algorithm. Using this article I intend to clarify the concept to myself and bring all that I read under one article which would be simple to understand and help others struggling with fft.

Let’s get started

Here A(x) and B(x) are polynomials of degree n - 1. Now we want to retrieve C(x) in

So our methodology would be this

1. Convert A(x) and B(x) from coefficient form to point value form. (FFT)
2. Now do the O(n) convolution in point value form to obtain C(x) in point value form, i.e. basically C(x) = A(x) * B(x) in point value form.
3. Now convert C(x) from point value from to coefficient form (Inverse FFT).

Q) What is point value form ?
Ans) Well, a polynomial A(x) of degree n can be represented in its point value form like this A(x) = {(x0, y0), (x1, y1), (x2, y2), ..., (xn - 1, yn - 1)} , where yk = A(xk) and all the xk are distinct.
So basically the first element of the pair is the value of x for which we computed the function and second value in the pair is the value which is computed i.e A(xk).
Also the point value form and coefficient form have a mapping i.e. for each point value form there is exactly one coefficient representation, if for k degree polynomial, k + 1 point value forms have been used at least.
Reason is simple, the point value form has n variables i.e, a0, a1, ..., an - 1 and n equations i.e. y0 = A(x0), y1 = A(x1), ..., yn - 1 = A(xn - 1) so only one solution is there.
Now using matrix multiplication the conversion from coefficient form to point value form for the polynomial can be shown like this

And the inverse, that is the conversion from point value form to coefficient form for the same polynomial can be shown as this

Now, let's assume A(x) = x2 + x + 1 = {(1, 3), (2, 7), (3, 13)} and B(x) = x2 - 3 = {(1,  - 2), (2, 1), (3, 6)}, where degree of A(x) and B(x) = 2
Now as C(x) = A(x) * B(x) = x4 + x3 - 2x2 - 3x - 3
C(1) = A(1) * B(1) = 3 *  - 2 =  - 6, C(2) = A(2) * B(2) = 7 * 1 = 7, C(3) = A(3) * B(3) = 13 * 6 = 78

So C(x) = {(1,  - 6), (2, 7), (3, 78)} where degree of C(x) = degree of A(x) + degree of B(x) = 4
But we know that a polynomial of degree n - 1 requires n point value pairs, so 3 pairs of C(x) are not sufficient for determining C(x) uniquely as it is a polynomial of degree 4.
Therefore we need to calculate A(x) and B(x), for 2n point value pairs instead of n point value pairs so that C(x)’s point value form contains 2n pairs which would be sufficient to uniquely determine C(x) which would have a degree of 2(n - 1).

##### Now if we had performed this algorithm naively it would have gone on like this

Note — This is NOT the actual FFT algorithm but I would say that understanding this would layout framework to the real thing.
Note — This is actually DFT algorithm, ie. Discrete fourier transform.

1. We construct the point value form of A(x) and B(x) using x0, x1, ..., x2n - 1 which can be made using random distinct integers. So point value form of A(x) = {(x0, α0), (x1, α1), (x2, α2), ..., (x2n - 1, α2n - 1)} and B(x) = {(x0, β0), (x1, β1), (x2, β2), ..., (x2n - 1, β2n - 1)} - (1) Note — The x0, x1, ..., x2n - 1 should be same for A(x) and B(x). This conversion takes O(n2).
2. As C(x) = A(x) * B(x), then what would have been the point-value form of C(x) ?
If we plug in x0 to all 3 equations then we see that
C(x0) = A(x0) * B(x0)
C(x0) = α0 * β0
So C(x) in point value form will be C(x) = {(x0, α0 * β0), (x1, α1 * β1), (x2, α2 * β2), ..., (x2n - 1, α2n - 1 * β2n - 1)}
This is the convolution, and it’s time complexity is O(n)
3. Now converting C(x) back from point value form to coefficient form can be represented by using the equation 2. Here calculating the inverse of the matrix requires LU decomposition or Lagrange’s Formula. I won’t be going into depth on how to do the inverse, as this wont be required in the REAL FFT. But we get to understand that using Lagrange’s Formula we would’ve been able to do this step in O(n2).

Note — Here the algorithm was performed wherein we used x0, x1, ..., x2n - 1 as ordinary real numbers, the FFT on the other hand uses roots of unity instead and we are able to optimize the O(n2) conversions from coefficient to point value form and vice versa to because of the special mathematical properties of roots of unity which allows us to use the divide and conquer approach. I would recommend to stop here and re-read the article till here until the algorithm is crystal clear as this is the raw concept of FFT.

##### A math primer on complex numbers and roots of unity would be a must now.

Q) What is a complex number ?
Answer — Quoting Wikipedia, “A complex number is a number that can be expressed in the form a + bi, where a and b are real numbers and i is the imaginary unit, that satisfies the equation i2 =  - 1. In this expression, a is the real part and b is the imaginary part of the complex number.” The argument of a complex number is equal to the magnitude of the vector from origin (0, 0) to (a, b), therefore arg(z) = a2 + b2 where z = a + bi.

Q) What are the roots of unity ?
Answer — An nth root of unity, where n is a positive integer (i.e. n = 1,  2,  3, ...), is a number z satisfying the equation zn = 1.
In the image above, n = 2, n = 3, n = 4, from LEFT to RIGHT.
Intuitively, we can see that the nth root of unity lies on the circle of radius 1 unit (as its argument is equal to 1) and they are symmetrically placed ie. they are the vertices of a n — sided regular polygon.

The n complex nth roots of unity can be represented as eik / n for k = 0, 1, ..., n - 1
Also Graphically see the roots of unity in a circle then this is quite intuitive.

If n = 4, then the 4 roots of unity would’ve been ei * 0 / n, ei * 1 / n, ei * 2 / n, ei * 3 / n = (ei / n)0, (ei / n)1, (ei / n / )2, (ei / n)3 where n should be substituted by 4.
Now we notice that all the roots are actually power of ei / n. So we can now represent the n complex nth roots of unity by wn0, wn1, wn2, ..., wnn - 1, where wn = ei / n

Now let us prove some lemmas before proceeding further

Note — Please try to prove these lemmas yourself before you look up at the solution :)

Lemma 1 — For any integer n ≥ 0, k ≥ 0 and d ≥ 0, wdndk = wnk

Proof — wdndk = (ei / dn)dk = (ei / n)k = wnk

Lemma 2 — For any even integer n > 0, wnn / 2 = w2 =  - 1

Proof — wnn / 2 = w2 * (n / 2)n / 2 = wd * 2d * 1 where d = n / 2

wd * 2d * 1 = w21 — (Using Lemma 1)

w21 = eiπ = cos(π) + i * sin(π) =  - 1 + 0 =  - 1

Lemma 3 — If n > 0 is even, then the squares of the n complex nth roots of unity are the (n/2) complex (n/2)th roots of unity, formally (wnk)2 = (wnk + n / 2)2 = wn / 2k

Proof — By using lemma 1 we have (wnk)2 = w2 * (n / 2)2k = wn / 2k, for any non-negative integer k. Note that if we square all the complex nth roots of unity, then we obtain each (n/2)th root of unity exactly twice since,

(Proved above)

Also, (wnk + n / 2)2 = wn2k + n = ei * k' / n, where k' = 2k + n

ei * k' / n = ei * (2k + n) / n = ei * (2k / n + 1) = e(2πi * 2k / n) + (2πi) = ei * 2k / n * ei = wn2k * (cos(2π) + i * sin(2π))

(Proved above)

Therefore, (wnk)2 = (wnk + n / 2)2 = wn / 2k

Lemma 4 — For any integer n ≥ 0, k ≥ 0, wnk + n / 2 =  - wnk

Proof — wnk + n / 2 = ei * (k + n / 2) / n = ei * (k / n + 1 / 2) = e(2πi * k / n) + (πi) = ei * k / n * eπi = wnk * (cos(π) + i * sin(π)) = wnk * ( - 1) =  - wnk

#### 1. The FFT — Converting from coefficient form to point value form

Note — Let us assume that we have to multiply 2 n — degree polynomials, when n is a power of 2. If n is not a power of 2, then make it a power of 2 by padding the polynomial's higher degree coefficients with zeroes.
Now we will see how is A(x) converted from coefficient form to point value form in using the special properties of n complex nth roots of unity.

yk = A(xk)

Let us define

Aeven(x) = a0 + a2 * x + a4 * x2 + ... + an - 2 * xn / 2 - 1, Aodd(x) = a1 + a3 * x + a5 * x2 + ... + an - 1 * xn / 2 - 1

Here, Aeven(x) contains all even-indexed coefficients of A(x) and Aodd(x) contains all odd-indexed coefficients of A(x).

It follows that A(x) = Aeven(x2) + x * Aodd(x2)

So now the problem of evaluating A(x) at the n complex nth roots of unity, ie. at wn0, wn1, ..., wnn - 1 reduces to

1. Evaluating the n/2 degree polynomials Aeven(x2) and Aodd(x2). As A(x) requires wn0, wn1, ..., wnn - 1 as the points on which the function is evaluated.

Therefore A(x2) would’ve required (wn0)2, (wn1)2, ..., (wnn - 1)2.

Extending this logic to Aeven(x2) and Aodd(x2) we can say that the Aeven(x2) and Aodd(x2) would require (wn0)2, (wn1)2, ..., (wnn / 2 - 1)2 ≡ wn / 20, wn / 21, ..., wn / 2n / 2 - 1 as the points on which they should be evaluated.

Here we can clearly see that evaluating Aeven(x2) and Aodd(x2) at wn / 20, wn / 21, ..., wn / 2n / 2 - 1 is recursively solving the exact same form as that of the original problem, i.e. evaluating A(x) at wn0, wn1, ..., wnn - 1. (The division part in the divide and conquer algorithm)

2. Combining these results using the equation A(x) = Aeven(x2) + x * Aodd(x2). (The conquer part in the divide and conquer algorithm).

Now, A(wnk) = Aeven(wn2k) + wnk * Aodd(wn2k), if k < n / 2, quite straightforward

And if k ≥ n / 2, then A(wnk) = Aeven(wn / 2k - n / 2) - wnk - n / 2 * Aodd(wn / 2k - n / 2)

Proof — A(wnk) = Aeven(wn2k) + wnk * Aodd(wn2k) = Aeven(wn / 2k) + wnk * Aodd(wn / 2k) using (wnk)2 = wn / 2k

A(wnk) = Aeven(wn / 2k) - wnk - n / 2 * Aodd(wn / 2k) using wnk' + n / 2 =  - wnk' i.e. (Lemma 4), where k' = k - n / 2.

So the pseudocode (Taken from CLRS) for FFT would be like this

1.RECURSIVE-FFT(a)
2. n = a.length()
3. If n =  = 1 then return a //Base Case
4. wn = ei / n
5. w = 1
6. aeven = (a0, a2, ..., an - 2)
7. aodd = (a1, a3, ..., an - 1)
8. yeven = RECURSIVE - FFT(aeven)
9. yodd = RECURSIVE - FFT(aodd)
10. For k = 0 to n / 2 - 1
11.     yk = ykeven + w * ykodd
12.     yk + n / 2 = ykeven - w * ykodd
13.     w *  = wn
14. return y;

#### 2. The Multiplication OR Convolution

This is simply this
1.a = RECURSIVE-FFT(a), b = RECURSIVE-FFT(b) //Doing the fft.
2.For k = 0 to n - 1
3.    c(k) = a(k) * b(k) //Doing the convolution in O(n)

#### 3. The Inverse FFT

Now we have to recover c(x) from point value form to coefficient form and we are done. Well, here I am back after like 8 months, sorry for the trouble. So the whole FFT process can be show like the matrix

The square matrix on the left is the Vandermonde Matrix (Vn), where the (k, j) entry of Vn is wnkj
Now for finding the inverse we can write the above equation as

Now if we can find Vn - 1 and figure out the symmetry in it like in case of FFT which enables us to solve it in NlogN then we can pretty much do the inverse FFT like the FFT. Given below are Lemma 5 and Lemma 6, where in Lemma 6 shows what Vn - 1 is by using Lemma 5 as a result.

Lemma 5 — For n ≥ 1 and nonzero integer k not a multiple of n,  = 0

Proof — Sum of a G.P of n terms.

We required that k is not a multiple of n because wnk = 1 only when k is a multiple of n, so to ensure that the denominator is not 0 we required this constraint.

Lemma 6 — For j, k = 0, 1, ..., n - 1,  the (j, k) entry of Vn - 1 is wn - kj / n

Proof — We show that Vn - 1 * Vn = In, the n * n identity matrix. Consider the (j, j') entry of Vn - 1 * Vn and let it be denoted by [Vn - 1 * Vn]jj'
So now

Now if j' = j then wnk(j' - j) = wn0 = 1 so the summation becomes 1, otherwise it is 0 in accordance with Lemma 5 given above. Note here that the constraints fore Lemma 5 are satisfied here as n ≥ 1 and j' - j cannot be a multiple of n as j' ≠ j in this case and the maximum and minimum possible value of j' - j is (n - 1) and  - (n - 1) respectively.

So now we have it proven that the (j, k) entry of Vn - 1 is wn - kj / n.

Therefore,
The above equation is similar to the FFT equation

The only differences are that a and y are swapped, we have replaced wn by wn - 1 and divided each element of the result by n
Therefore as rightly said by Adamant that for inverse FFT instead of the roots we use the conjugate of the roots and divide the results by n.

That is it folks. The inverse FFT might seem a bit hazy in terms of its implementation but it is just similar to the actual FFT with those slight changes and I have shown as to how we come up with those slight changes. In near future I would be writing a follow up article covering the implementation and problems related to FFT.

Part 2 is here

References used — Introduction to Algorithms(By CLRS) and Wikipedia

Feedback would be appreciated. Also please notify in the comments about any typos and formatting errors :)