adamant's blog

By adamant, history, 13 months ago, In English

Hi everyone!

Consider the following problem. You're given two arrays $$$a_1, a_2, \dots, a_{n!}$$$ and $$$b_1, b_2, \dots, b_{n!}$$$.

You need to compute the array $$$c_1, c_2, \dots, c_{n!}$$$, such that

$$$ c_k = \sum\limits_{i \circ j = k} a_i b_j, $$$

where $$$i \circ j$$$ denotes the composition of the $$$i$$$-th and the $$$j$$$-th lexicographically smallest permutations of $$$n$$$ elements.

Generally, typical approach to such things would be to escape into space in which the convolution corresponds to the component-wise product of two vectors, and then to transform back into initial space. For example, with the summation rule being $$$i + j \equiv k \pmod n$$$, it is well known that the corresponding transform is exactly the discrete Fourier transform of vectors $$$a_i$$$ and $$$b_j$$$.

But with more generic groups, things may be much more complicated, and I will write a series of blog posts explaining it.

Prerequisites

Difficulty: ★★★★★

One is expected to have some reasonable intuition with group theory and be well-familiar with linear algebra, and be familiar with some tangent definitions (e.g. what is a ring, what is a field, what is a linear span, what is a vector space, dot product, hermite product etc). Familiarity with discrete Fourier transform, and similar transforms used to compute convolutions is also essential.

Useful definitions from group theory and linear algebra


What kind of transform are we looking for?

For now, let's work in a more generic setting. Assume that we have an operation $$$\circ$$$, and we want to find the convolution

$$$ c_k = \sum\limits_{i \circ j = k} a_i b_j. $$$

Let's analyse what we are doing for different operations:

  • For $$$i \circ j = (i + j) \bmod n$$$, we use the fact that the DFT of $$$c$$$ is the point-wise product of DFTs of $$$a$$$ and $$$b$$$;
  • For $$$i \circ j = i \oplus j$$$, we use the fact that the WHT of $$$c$$$ is the point-wise product of WHTs of $$$a$$$ and $$$b$$$;
  • For $$$i \circ j = i | j$$$, we use the fact that sum over subset transform of $$$c$$$ is the point-wise product of such transforms of $$$a$$$ and $$$b$$$.

One way to define a unifying framework here is to introduce a formal variable $$$x$$$ such that $$$x^i x^j = x^{i \circ j}$$$. You can read further about such approach in this blog about subset convolution. Using such formal variable, we can define the convolution as

$$$ C(x) = \sum\limits_{k \in G} c_k x^k = \sum\limits_{k \in G} x^k \sum\limits_{i \circ j = k} a_i b_j = \sum\limits_{i \in G} \sum\limits_{j \in G} a_i b_j x^{i \circ j} = A(x) B(x). $$$

Then, we find some sequence $$$x_1, x_2, \dots$$$ of objects such that the formula $$$x_t^i x_t^j = x_t^{i \circ j}$$$ is true for every $$$t$$$, and then we could evaluate the "polynomials" $$$A(x)$$$ and $$$B(x)$$$ in $$$x_1, x_2, \dots$$$ and interpolate the polynomial $$$C(x)$$$ from $$$C(x_t) = A(x_t) B(x_t)$$$.

Examples

In the case of $$$i + j \pmod n$$$, the sequence $$$x_1, x_2, \dots$$$ consists of the roots of the polynomial $$$x^n - 1$$$, as

$$$ x_t^i x_t^j = x_t^{(i + j) \bmod n} $$$

for any root $$$x_t$$$ of $$$x^n-1$$$. For $$$i \oplus j$$$, specific $$$x_t$$$ are corner points of the hypercube $$$[-1, 1]^n$$$, so $$$x_t^i$$$ rewrites as

$$$ x_{t_1}^{i_1} x_{t_2}^{i_2} \dots x_{t_n}^{i_n}, $$$

where $$$x_{t_k}$$$ is the $$$k$$$-th coordinate of $$$x_t$$$ (that is, either $$$1$$$ or $$$-1$$$), and $$$i_1, \dots, i_n$$$ is the binary representation of $$$i$$$. In this way, it held that

$$$ x_t^i x_t^j = x_{t_1}^{i_1} x_{t_2}^{i_2} \dots x_{t_n}^{i_n} x_{t_1}^{j_1} x_{t_2}^{j_2} \dots x_{t_n}^{j_n} = x_{t_1}^{i_1 \oplus j_1} x_{t_2}^{i_2 \oplus j_2} \dots x_{t_n}^{i_n \oplus j_n} = x_t^{i \oplus j}. $$$

And with $$$i | j$$$, it's also very similar, except for the evaluation takes place in $$$[0, 1]^n$$$ instead of $$$[-1, 1]^n$$$.

Formalization

When talking about $$$x_1, x_2, \dots$$$ above, we named them "objects", and we required that we're able to raise them to the power $$$i$$$, where $$$i$$$ is the element of a set on which the operation $$$i \circ j$$$ is defined. To make things formal, we may interpret each $$$x_t$$$ as a function

$$$ x_t : G \to R, $$$

such that

$$$ x_t(i) \cdot x_t(j) = x_t(i \circ j), $$$

where $$$R$$$ is some ring (complex numbers or remainders modulo $$$m$$$ for the discrete Fourier transform). For the sake of notational convenience, we will continue to denote $$$x_t(i)$$$ as $$$x_t^i$$$ to treat it as an object with the operation $$$\circ$$$ in its powers.


Def. 1. Let $$$G_1$$$ and $$$G_2$$$ be two sets with binary operations $$$\circ_1$$$ and $$$\circ_2$$$ defined on them. A function $$$x : G_1 \to G_2$$$, such that

$$$ x(i) \circ_2 x(j) = x(i \circ_1 j) $$$

is called a homomorphism from $$$G_1$$$ to $$$G_2$$$.


In other words, what we're really looking for as the "objects" for evaluation and interpolation of such polynomials are homomorphisms from the set $$$G$$$ with operation $$$\circ$$$ into the multiplicative group of some ring.

Group representations

The functions $$$x_1, x_2, \dots$$$ are defined above very generically, as they do not expect any additional constraints on the set $$$G$$$, on which the operation $$$\circ$$$ is defined, nor on the ring $$$R$$$. To make it more meaningful, we will now have to restrict the definition a bit.

Def. 2. Let $$$V$$$ be a vector space. Then, its automorphism group $$$\operatorname{Aut} V$$$ is the group of all invertible linear maps of $$$V$$$ on itself.

Def. 3. A representation of a group $$$G$$$ in a vector space $$$V$$$ over $$$\mathbb C$$$ is a homomorphism $$$x : G \to \operatorname{Aut} V$$$.

Def. 4. The dimension $$$n$$$ of the vector space $$$V$$$ is called the degree of the representation $$$x$$$.

Note: Generally, the elements of $$$V$$$ can be identified (by picking a basis) with the elements of the vector space $$$\mathbb C^n$$$, and the elements of $$$\operatorname{Aut} V$$$ can be identified with the elements of the general linear group $$$GL(n, \mathbb C)$$$, that is with invertible matrices of size $$$n \times n$$$. It is useful to keep in mind when dealing with implementations, but we will stick to the coordinate-free approach for now, to better understand the high-level underlying structure of objects at hand.


Example. For the cyclic group $$$C_n$$$, we can define a parametrized family of representations on $$$GL(1, \mathbb C)$$$:

$$$ x_k(j) = x_k^j = e^{\frac{2 \pi ijk}{n}}, $$$

in which case it indeed holds that

$$$ x_k^a \cdot x_k^b = e^{\frac{2 \pi iak}{n}} e^{\frac{2 \pi ibk}{n}} = e^{\frac{2 \pi i(a+b)k}{n}} = x_k^{(a+b) \bmod n}. $$$

As you may recognize, this family of representations is used to compute the standard discrete Fourier transform.


Using the definitions above, we can say that we should find a sequence of representations $$$x_1, x_2, \dots$$$, then we compute $$$A(x_t)$$$ and $$$B(x_t)$$$ for each $$$t$$$, from which we get the expression for $$$C(x_t)$$$ as $$$C(x_t) = A(x_t) \cdot B(x_t)$$$. While this underlines the general idea, it's still unclear how to choose the set of representations which would allow to invert the transform and recover the sequence $$$c$$$.

What representations are considered disitinct?

One of important things that was needed for interpolation of polynomials to be possible, is that the points, in which interpolation is done, are distinct. Let's think for a moment, how to determine if two representations $$$x$$$ and $$$y$$$ are distinct. Let's say that $$$x$$$ acts on the vector space $$$V$$$ and $$$y$$$ acts on vector space $$$W$$$. We can say that the representations $$$x$$$ and $$$y$$$ are similar if we can map their underlying spaces to each other in such a way that there is a correspondence between how $$$x$$$ acts on $$$V$$$, and how $$$y$$$ acts on $$$W$$$. Let's formalize it.



An equivariant map $$$f$$$ should be consistent with how $$$x^g$$$ and $$$y^g$$$ act on $$$V$$$ and $$$W$$$

Def. 5. Let $$$x$$$ and $$$y$$$ be representations of $$$G$$$ in vector spaces $$$V$$$ and $$$W$$$ correspondingly. An equivariant map from $$$V$$$ to $$$W$$$ is a linear map $$$f : V \to W$$$ such that for any element of the group $$$g \in G$$$, and any element of the vector space $$$v \in V$$$, applying $$$x^g$$$ to $$$v$$$ and then moving to $$$W$$$ via $$$f$$$ is the same as first moving to $$$W$$$ via $$$f$$$, and then applying $$$y^g$$$, that is $$$f(x^g(v)) = y^g(f(v))$$$.

In other words, $$$f x^g = y^g f$$$ for any element $$$g \in G$$$, where concatenation denotes the composition of linear maps. If the equivariant map $$$f$$$ is bijective, it is called an isomorphism of the representations $$$x$$$ and $$$y$$$, and the representations themself are called isomorphic. We will denote representation isomorphism as $$$x \sim y$$$.


Note: If you're familiar with linear algebra, you can recognize that the isomorphism of representations can also be defined as

$$$ x^g = f^{-1} y^g f $$$

that should hold for some invertible linear map $$$f : V \to W$$$ and all $$$g \in G$$$. From linear algebra perspective it corresponds to how linear maps change after change of basis denoted by the transition matrix $$$f$$$. Note that if $$$x$$$ and $$$y$$$ are isomorphic in the sense defined above, then for any polynomial $$$A(x)$$$ it holds that $$$A(x) = f^{-1} A(y) f$$$, meaning that evaluating in two representations that are isomorphic to each other does not provide any more information than evaluating in only one of them.

Unitary representations

Consider a representation $$$x : G \to \operatorname{Aut} V$$$. From the previous section it follows that any representation obtained by changing the basis of $$$V$$$ is isomorphic to $$$x$$$. Of all the possible ways to pick the basis of $$$V$$$, there is one that benefits us greatly when we analyze it.

As a finite-dimensional space, $$$V$$$ can be equipped with an inner product $$$( \cdot, \cdot )$$$. The concept of inner product generalizes the notion of dot product in Euclidean spaces. Let's recall the definition:

Recall the definition

It turns out, that when $$$G$$$ is finite, it is always possible to define the linear product $$$\langle \cdot, \cdot \rangle$$$ on $$$V$$$ in a way that is consistent with the representation, that is, for any $$$u, v \in V$$$ and any $$$g \in G$$$, it would hold that $$$\langle u, v \rangle = \langle x^g u, x^g v \rangle$$$.

Proof

In matrix form it would mean that $$$v^* u = v^* (x^g)^* x^g u$$$, from which follows that $$$(x^g)^* x^g = I$$$, hence $$$(x^g)^* = (x^g)^{-1}$$$ for each $$$g \in G$$$. It means that in the orthonormal basis associated with this inner product (which can be constructed via Gram-Schmidt process) every $$$x^g$$$ is a unitary transformation (analogue of orthogonal transformations for vector spaces over complex numbers).

The existence of such inner product for representations is crucial in establishing some facts below.

Irreducible representations

With more typical convolutions, the representations were always $$$1$$$-dimensional. One important observation about potentially multi-dimensional representations is that they sometimes contain other, "smaller" representations in them.

Def. 6. A representation $$$x$$$ is called reducible if the space $$$V$$$ has a non-trivial subspace $$$W$$$ that is invariant under each $$$x^g$$$.

Def. 7. The representation $$$y$$$ obtained by reducing $$$x$$$ on an invariant subspace $$$W$$$ is called a subrepresentation of $$$x$$$.


Example. Consider the cyclic group $$$C_2$$$ and the representation $$$x$$$ defined as

$$$\begin{gather} x^0 = \begin{pmatrix}1 & 0 \\ 0 & 1\end{pmatrix}, & x^1 = \begin{pmatrix}0 & 1 \\ 1 & 0\end{pmatrix}. \\ \end{gather}$$$

It acts on the $$$2$$$-dimensional space $$$\mathbb C^2$$$. Then, $$$x^0$$$ applied to a vector $$$(x, y)$$$ doesn't change it, while $$$x^1$$$ swaps the coordinates, making it into $$$(y, x)$$$. The transforms $$$x^0$$$ and $$$x^1$$$ have two common non-trivial invariant subspaces, each spanned on the vectors $$$(1, 1)$$$ and $$$(1, -1)$$$ correspondingly. This allows to reduce the whole representation on $$$GL(2, \mathbb C)$$$ to two subrepresentations on $$$GL(1, \mathbb C)$$$.

The first representation is $$$x^0 = 1$$$ and $$$x^1 = 1$$$, in the subspace spanned on $$$(1, 1)$$$, while the second representation is $$$x^0=1$$$ and $$$x^1 = -1$$$ in the subspace spanned on $$$(1, -1)$$$. In this way, we can reduce the initial $$$2$$$-dimensional representation to a pair of $$$1$$$-dimensional representations.


Def. 8. The direct sum of representations $$$x$$$ and $$$y$$$ in vector spaces $$$V$$$ and $$$W$$$ is a representation $$$z$$$ such that for any $$$g \in G$$$, the transformation $$$z^g$$$ is a direct sum of $$$x^g$$$ and $$$y^g$$$, acting on the direct sum of vector spaces $$$V \oplus W$$$.

Note: In other words, each element of the vector space $$$V \oplus W$$$ can be represented as a pair $$$(v, w)$$$ of an element $$$v \in V$$$ and an element $$$w \in W$$$. Then, $$$z^g$$$ applied to such vector would yield a vector $$$(x^g v, y^g w)$$$. In matrix form, we can rewrite in a block-diagonal manner:

$$$ z^g \begin{pmatrix} v \\ w\end{pmatrix} = \begin{pmatrix} x^g & 0 \\ 0 & y^g \end{pmatrix} \begin{pmatrix} v \\ w \end{pmatrix} = \begin{pmatrix}x^g v \\ y^g w\end{pmatrix}. $$$

Def. 9. A representation $$$x$$$ is called decomposible if it is isomorphic to a direct sum of non-trivial representations $$$x_1$$$ and $$$x_2$$$.

It can be proven that if $$$x$$$ is a reducible representation of a finite group, then it is also decomposible. In other words, for any representation $$$x$$$ of a finite group, there is a set of irreducible representations $$$x_1, \dots, x_k$$$ such that $$$x \sim x_1 + \dots + x_k$$$. This result is also known in the representation theory as Maschke’s theorem.

Proof
Why irreducible representations are of interest?

Assume that the representation $$$x$$$ is the direct sum of $$$x_1$$$ and $$$x_2$$$, that is (in the basis corresponding to $$$V = V_1 \oplus V_2$$$)

$$$ x^g = \begin{pmatrix} x_1^g & 0 \\ 0 & x_2^g \end{pmatrix}, $$$

then the result of computing the "polynomial" $$$A(x)$$$ can be represented as

$$$ A(x) = \sum\limits_{g \in G} a_g x^g = \sum\limits_{g \in G} a_g \begin{pmatrix} x_1^g & 0 \\ 0 & x_2^g \end{pmatrix} = \begin{pmatrix} A(x_1) & 0 \\ 0 & A(x_2) \end{pmatrix}. $$$

Hence instead of the value $$$A(x)$$$ we may consider the pair of values $$$A(x_1)$$$ and $$$A(x_2)$$$. So it makes sense to only consider irreducible representations, as they carry the same amount of information about $$$G$$$ as their direct sum.

Fourier transform on finite groups

Let's do our first stop here and reflect on what we learnt so far. We learnt how to compare representations with each other (by telling if they are isomorphic) and we learnt that only irreducible representations are of interest, as any other representations can be expressed as direct sum of irreducible ones. This already allows us to define the direct (but not the inverse) Fourier transform on finite groups.


Def. 10. Let

$$$ A(x) = \sum\limits_{g \in G} a_g x^g, $$$

where $$$x$$$ is a placeholder "variable". Then the Fourier transform of $$$A(x)$$$ is the sequence $$$A(x_1), A(x_2), \dots$$$, where $$$x_1, x_2, \dots$$$ are pairwise distinct (not isomorphic to each other) irreducible representations of $$$G$$$.


Well, this partially deals with the evaluation part, but still leaves us with a lot of questions. How many irreducible representations are there? Are they even finite? Is it possible to compute the transform efficiently for generic groups? And, of course, how the hell do we do the inverse transform from this? We will deal with these questions step by step over the next parts of the series, so stay tuned!

  • Vote: I like it
  • +98
  • Vote: I do not like it

| Write comment?
»
13 months ago, # |
  Vote: I like it +10 Vote: I do not like it

Thanks for the nice blog! For anyone who wants to try and solve a problem based on what's (presumably) coming in the next parts of the blog series, I recommend solving problem F from this gym (disclaimer: I set that problem). It needs lesser theory and might be helpful in motivating the setup that's required for the general case.

»
13 months ago, # |
  Vote: I like it 0 Vote: I do not like it

sir im not coming to codeforces to leanr math