One of my favorite algorithms out there is FWHT, but sadly there are not many tutorials about it. So here's a simple attempt by me (feedback is appreciate).

**Note:** I will be explaining FWHT via DFT/FFT and as such good understanding of theory behind DFT/FFT is **required**. Also, I won't be going into Hadamard matrices, since I don't know much about them myself.

What is FWHT? Suppose you have two sets $$$ (1, 2, 2) $$$ and $$$(3, 4, 5)$$$ and want to perform all possible xor operations between these two sets, the resulting set being $$$(1, 1, 2, 4, 5, 6, 6, 7, 7)$$$. This is commonly called xor convolution. Naively you'd do this in $$$O(n^2)$$$ time, where $$$n=2^k$$$, $$$k$$$ = maximum number of bits (you maintain a frequency count for each possible value and loop on all possible pair). FWHT let's you do this in $$$O(n\log n)$$$ time via some ~~black magic~~ FFT tricks. We first build the frequency arrays upto $$$n=2^k$$$, apply FWHT to turn them into some mysterious point-value form, perform pointwise multiplication and finally perform inverse FWHT on the resulting array to recover resultant frequency array. But how does this work? And why should you know it?

Let's answer the first question. Though we will need to learn two facts about FFT first:

1) When we want to multiply $$$(1+2x)$$$ and $$$(3+4x)$$$, we first extend their size to 4, then transfer them to point-value form, then perform pointwise multiplication, and then finally perform inverse transform. Notice the first step, we extend the size because their product is of size 3 and so will require at least 3 distinct $$$(x, y)$$$ point-value pairs (we use 4 cause it's a power of two). What if we didn't do this size extention? You will see that after the last step you get the polynomial $$$(11+10x)$$$ instead of $$$(3+10x+8x^2)$$$. The explanation is that we used quadratic roots of unity $$$1, -1$$$. They don't have any square, instead for them, $$$x^2=1$$$, so the quadratic term "wraps around", contributing to the constant term. The lesson is that if you don't use sufficiently granular roots, the larger powers "wrap around".

2) We use FFT to multiply monovariable polynomials, what about multivariable polynomials? Everything is same as before, we just need a suitable point value form so that we can recover the polynomial back from it. For a monovariable polynomial of size $$$n$$$, we need $$$n$$$ distinct values (we use roots of unity in FFT for divide and conquer speeed up). In a multivariable polynomial, take $$$P(x, y)$$$ $$$= (1+2x+3y+4xy)$$$ for example, all we need is distinct value sets for each of the variables. Here, x is of degree 1 and y is of degree 1. So we can just use two distinct value sets for them. For example, we can use $$$10, 20$$$ and $$$30, 40$$$ for $$$x$$$ and $$$y$$$ respectively. Then the values $$$P(10, 30), P(10, 40), P(20, 30), P(20, 40)$$$ uniquely define the polynomial $$$P(x, y)$$$. We could've used $$$10, 20$$$ for both variables and that'd also work.

We incorporate this into multivariable FFT by by dividing the polynomials into polynomial of $$$x$$$ grouping by $$$y$$$'s power, like $$$P(x, y) = (1+2x) +y(3+4x) = Q_0(x) + yQ_1(x)$$$. First let's apply the FFT on the $$$Q_i$$$, to get $$$Q_0(1), Q_0(-1), Q_1(1), Q_1(-1)$$$. Then we just note that applying FFT on $$$Q_0(1)+yQ_1(1)$$$ and $$$Q_0(-1)+yQ_1(-1)$$$ will give us $$$P(1,1), P(1, -1)$$$ and $$$P(-1, 1), P(-1, -1)$$$. So the lesson is, multivariable FFT is just normal FFT with some grouping of values.

**Now to FWHT**

Like it was,previously said, FWHT will convert the frequency arrays to some magical form where performing pointwise multiplication somehow performs xor. Note the similarity with polynomial multiplication here. In polynomial multiplication, $$$ax^i$$$ and $$$bx^j$$$ multiply to merge into $$$abx^{i+j}$$$, we perform this merging for all possible pairs and group them according to $$$x$$$'s power. In FWHT, what we want is for $$$ax^i$$$ and $$$bx^j$$$ to merge into $$$abx^{i \oplus j}$$$ (considering the frequency array as a polynomial). We can do an interesting thing here, break variables into more varibles, utilizing a number's binary form! Let me explain. Suppose in a problem xor convolution requires 3 bits maximum, then terms like $$$ax^3$$$ and $$$bx^5$$$ are brokem into $$$ax_2^0x_1^1x_0^1$$$ and $$$bx_2^1x_1^0x_0^1$$$ (cause 3 is 011 and 5 is 101).

In this new form, there are variables representing each bit. So if we can somehow make multiplication behave well in single bit powers, then xor convolution will become easy! More formally, if we can somehow make it that $$$x_i^0 \cdot x_i^0=x_i^1\cdot x_i^1=x_i^0$$$ and $$$x_i^0 \cdot x_i^1 = x_i^1$$$ then $$$ax_2^0x_1^1x_0^1$$$ and $$$bx_2^1x_1^0x_0^1$$$ will multiply to form $$$abx_2^1x_1^1x_0^0$$$ ($$$3 \oplus 5 = 6$$$). This is last obstacle. But if you notice the equations, you will see that they are basically $$$x_i^p \cdot x_i^q= x_i^{(p+q) \pmod 2}$$$. And now you can see why I had talked so much about FFT with limited domain at the beginning. If we use the numbers $$$1, -1$$$ then as $$$x^2=1$$$, behavior's $$$x_i^p \cdot x_i^q= x_i^{(p+q) \pmod 2}$$$ has been achieved!

Now bringing it all together, we think of the terms $$$ax^i$$$ as being broken upto variables that represent the exponent's binary form. We will multiply this new polynomial using multivariable FFT. And to make sure that they behave like xor-ing, we use the domain $$$1,-1$$$ for each variable.(this wraps the $$$x^2$$$ term back to constants). So FWHT is now just a fancy FFT, huzzah!

Of course we don't need to do actual FFT algorithm for the variables, a polynomial $$$a+bx$$$ can be converted to point value form of $$$(a+b, a-b)$$$ by simply hardcoding it. And for inverting the transform, observe that if $$$u=a+b$$$ and $$$v=a-b$$$ then $$$a=\frac{u+v}{2}$$$ and $$$b=\frac{u-v}{2}$$$. This is why codes for FWHT perform division by 2 in inverting (after doing ordinary FWHT).

I am not providing any code since one can find many good implementations online.

But now to the second question I asked at the beginning, why should you learn it? The answer is simple really, I believe you should understand an algorithm to be able to modify it to specific needs. For example, I set a problem in Codechef's November Long challenge, where you will definitely need to modify the traditional FWHT (along with some more tricks). If you have found more problems requiring modifications on FWHT, please comment so. Also, constructive criticism is appreciated.

**EDIT:** Forgot to add two things. First of all there are some well known common modifications of FWHT: "Or convolution" and "And convolution". They are the same thing as xor convolution, except you do, 'or' or 'and'. The "or convolution" is almost the same except the behavior we want is $$$x_i^0 \cdot x_i^0 = x_i^0$$$ and $$$x_i^0 \cdot x_i^1 = x_i^1 \cdot x_i^0 = x_i^1 \cdot x_i^1 = x_i^1$$$. For this, obviously $$$1$$$ is a candidate, another ~~illegal~~ cunning candidate is $$$0$$$ (we'll use $$$0^0 = 1$$$). So we use the domain $$$1, 0$$$. "And convolution" is just "Or convolution" on complements, which in terms of implementation is just reversing the logic of "Or convolution".

And finally, I learned this interpretation of FWHT from this helpful blog. But I believe the explanation there is not elaborate. Thus my attempt to write this blog.