MacMahon's master theorem

#	User	Rating
1	tourist	3880
2	jiangly	3669
3	ecnerwala	3654
4	Benq	3627
5	orzdevinwang	3612
6	Geothermal	3569
6	cnnfls_csy	3569
8	jqdai0815	3532
9	Radewoosh	3522
10	gyh20	3447

#	User	Contrib.
1	awoo	161
1	maomao90	161
3	adamant	156
4	maroonrk	153
5	-is-this-fft-	148
5	atcoder_official	148
5	SecondThread	148
8	Petr	147
9	nor	144
10	TheScrasse	142

Hi everyone!

Mandatory orz to Elegia whose blog introduced me to MMT as an elegant approach to prove Dixon's identity.

Today, I would like to write about MacMahon's master theorem (MMT). It is a nice result on the intersection of combinatorics and linear algebra that provides an easy proof to some particularly obscure combinatorial identities, such as the Dixon's identity:

$$$ \sum\limits_k (-1)^k \binom{a+b}{a+k} \binom{b+c}{b+k} \binom{c+a}{c+k} = \binom{a+b+c}{a,b,c}. $$$

Besides that, MMT has large implications in Quantum Physics, which we will hopefully discuss. For now, let's formulate MMT.

MMT. Let $$$\mathbf A = [a_{ij}]_{n\times n}$$$, $$$\mathbf X = \operatorname{diag}(x_1,\dots,x_n)$$$, $$$\mathbf t = (t_1,\dots,t_n)$$$, $$$\mathbf x = (x_1,\dots,x_n)$$$ and $$$\mathbf k = (k_1,\dots,k_n) \geq 0$$$. Then,

$$$ \boxed{[\mathbf t^\mathbf k] \prod\limits_{i=1}^n \left(\sum\limits_{j=1}^n a_{ij} t_j\right)^{k_i} = [\mathbf x^\mathbf k] \det(\mathbf I-\mathbf X \mathbf A)^{-1}} $$$

where $$$\mathbf t^\mathbf k$$$ stands for $$$t_1^{k_1} \dots t_n^{k_n}$$$, and $$$\mathbf x^\mathbf k$$$, correspondingly, for $$$x_1^{k_1} \dots x_n^{k_n}$$$, and $$$\mathbf I = [\delta_{ij}]_{n\times n}$$$ is the identity matrix.

With MMT, the Dixon's identity is proven as elegantly as this:

Proof

Let's now learn how to prove MMT itself.

Variable-less MMT

Now, let's look into MMT and try to understand what does it actually stand for? First of all, we can multiply both sides by $$$\mathbf x^\mathbf k$$$ and sum them up over all $$$\mathbf k \geq 0$$$. In this case, the RHS, by definition, will turn into $$$\det(\mathbf I - \mathbf X \mathbf A)^{-1}$$$, while the LHS will turn into

$$$ \sum\limits_{\mathbf k \geq 0} [\mathbf t^\mathbf k] \prod\limits_{i=1}^n\left(\sum\limits_{j=1}^n a_{ij} x_i t_j\right)^{k_i}. $$$

We can now do the substitution $$$\mathbf B = \mathbf X \mathbf A$$$ and use $$$b_{ij} = a_{ij} x_i$$$ to reformulate MMT in the equivalent form

$$$ \boxed{\sum\limits_{\mathbf k \geq 0} [\mathbf t^\mathbf k] \prod\limits_{i=1}^n \left(\sum\limits_{j=1}^n b_{ij} t_j\right)^{k_i} = \det(\mathbf I - \mathbf B)^{-1}} $$$

This form is more convenient for us to work with, as it doesn't depend on any variables in RHS.

LHS as the sum of permanents

Now, let's take a closer look on the individual summands of the LHS. What do they enumerate?

$$$ [\mathbf t^\mathbf k]\prod\limits_{i=1}^n \left(\sum\limits_{j=1}^n b_{ij} t_j\right)^{k_i} = \sum\limits_{(j_1,\dots,j_k)} \prod\limits_{s=1}^k b_{i_s j_s}, $$$

where $$$(i_1,\dots,i_k)$$$ is a tuple such that $$$1 \leq i_1 \leq \dots \leq i_k \leq n$$$ and each $$$i$$$ occurs exactly $$$k_i$$$ times in it. Correspondingly, $$$(j_1,\dots,j_k)$$$ are all possible rearrangements of such tuple. So, we go over all rearrangements of $$$(i_1,\dots,i_k)$$$ and sum up the product of $$$b_{i_s j_s}$$$ over $$$s$$$.

Doesn't it look familiar? If we consider all permutations of $$$(1,\dots,k)$$$ rather than direct rearrangements of $$$(i_1,\dots,i_k)$$$, this will just be a permanent of a $$$(k_1+\dots+k_n) \times (k_1+\dots+k_n)$$$ matrix, in which the element $$$b_{ij}$$$ occurs as a block of size $$$k_i \times k_j$$$.

This concept usually occurs in literature defined even for two distinct vectors $$$\mathbf k \geq 0$$$ and $$$\mathbf l \geq 0$$$, so that $$$b_{ij}$$$ occurs in a block of size $$$k_i \times l_j$$$. The resulting matrix is typically denoted as $$$\mathbf B^{(\mathbf k, \mathbf l)}$$$. So, e.g. for $$$n = 3$$$, $$$\mathbf k = (3, 2, 1)$$$ and $$$\mathbf l = (1, 2, 3)$$$, $$$\mathbf B^{(\mathbf k, \mathbf l)}$$$ would look like

$$$ \mathbf B = \begin{pmatrix} b_{11} & b_{12} & b_{13} \\ b_{21} & b_{22} & b_{23} \\ b_{31} & b_{32} & b_{33} \end{pmatrix} \mapsto \left( \begin{array}{c|c|c} \begin{matrix} b_{11} \\ b_{11} \\ b_{11} \end{matrix} & \begin{matrix} b_{12} & b_{12} \\ b_{12} & b_{12} \\ b_{12} & b_{12} \end{matrix} & \begin{matrix} b_{13} & b_{13} & b_{13} \\ b_{13} & b_{13} & b_{13} \\ b_{13} & b_{33} & b_{33} \end{matrix} \\ \hline \begin{matrix} b_{21} \\ b_{21} \end{matrix} & \begin{matrix} b_{22} & b_{22} \\ b_{22} & b_{22}\end{matrix} & \begin{matrix} b_{23} & b_{23} & b_{23} \\ b_{23} & b_{23} & b_{23} \end{matrix} \\ \hline \begin{matrix}b_{31}\end{matrix} & \begin{matrix} b_{32} & b_{32} \end{matrix} & \begin{matrix} b_{33} & b_{33} & b_{33} \end{matrix} \end{array} \right) = \mathbf B^{(\mathbf k, \mathbf l)}. $$$

Note that to go back to rearrangements of $$$(i_1,\dots,i_k)$$$ rather than permutations of $$$(1,\dots,k)$$$ we should divide the permanent by $$$k_1 ! \dots k_n!$$$, which is commonly denoted in literature as $$$\mathbf k!$$$. In this terms, MMT rewrites concisely as

$$$ \boxed{\sum\limits_{\mathbf k \geq 0} \frac{\operatorname{per}\mathbf B^{(\mathbf k, \mathbf k)}}{\mathbf k!} = \det(\mathbf I - \mathbf B)^{-1}} $$$

Or, in terms of the original matrix $$$\mathbf A$$$, as

$$$ \boxed{\sum\limits_{\mathbf k \geq 0} \operatorname{per}\mathbf A^{(\mathbf k, \mathbf k)} \frac{\mathbf x^\mathbf k}{\mathbf k!} = \det(\mathbf I - \mathbf X\mathbf A)^{-1}} $$$

Sums of permanents as traces of symmetric powers

Now, let's denote $$$|\mathbf k| = k_1+\dots+k_n$$$. We can show that

$$$ \sum\limits_{|\mathbf k| = k} \frac{\operatorname{per}\mathbf B^{(\mathbf k,\mathbf k)}}{\mathbf k!} = \operatorname{tr} \operatorname{S}^k(\mathbf B), $$$

where $$$\operatorname{S}^k(\mathbf B)$$$ is the so-called symmetric power of $$$\mathbf B$$$.

Let's properly define it to better understand what this means. Before doing so, we should define tensor products. Let $$$V$$$ and $$$W$$$ be vector spaces with bases $$$B_V$$$ and $$$B_W$$$ correspondingly. Then, their tensor product $$$V \otimes W$$$ is the vector space with associated bilinear operation $$$\otimes : V \times W \to V \otimes W$$$, such that the set $$$\{v \otimes w : v \in B_V, w \in B_W\}$$$ forms a basis of $$$V \otimes W$$$.

If we represent vectors in coordinate form, one possible way to represent tensor products is via Kronecker product operation applied to coordinate arrays. When tensor product is defined on vectors, it is also implicitly defined on linear maps. So, if $$$A : V_1 \to V_2$$$ and $$$B: W_1 \to W_2$$$ are linear maps, then their tensor product is the linear map $$$A \otimes B : V_1 \otimes W_1 \to V_2 \otimes W_2$$$, such that $$$(A \otimes B)(v \otimes w) = (Av)\otimes (Bw)$$$. If $$$A$$$ and $$$B$$$ are represented by matrices, $$$A \otimes B$$$ can be represented as their Kronecker product.

To define symmetric powers, we should consider $$$V \otimes V$$$, a tensor product of $$$V$$$ with itself. Assuming that $$$V$$$ has a basis $$$e_1,\dots,e_n$$$, we can say that $$$V \otimes V$$$ has a basis $$$\{e_i \otimes e_j : 1 \leq i, j \leq n\}$$$, so we can represent any $$$v \in V \otimes V$$$ as an array $$$[v_{ij}]_{n \times n}$$$ such that

$$$ v = \sum\limits_{i,j} v_{ij} (e_i \otimes e_j). $$$

Now, of all vectors $$$v \in V \otimes V$$$ we can consider a subset of symmetric vectors, that is those for which $$$v_{ij} = v_{ji}$$$ for any $$$(i,j)$$$. Such vectors form a subspace $$$\operatorname{S}^2(V)$$$ of $$$V \otimes V$$$, also called the symmetric square of $$$V$$$. As a subspace of $$$V$$$, symmetric square has a basis

$$$ \{e_{ij} = e_i \otimes e_j + e_j \otimes e_i : 1 \leq i \leq j \leq n\}. $$$

Finally, we can define the symmetric power $$$\operatorname{S}^k(V)$$$ of $$$V$$$ as a subspace of $$$V^{\otimes k} = \overbrace{V \otimes \dots \otimes V}^k$$$ formed by symmetric arrays. By this we mean that any element of $$$V^{\otimes k}$$$ can be represented as an array $$$[v_{i_1 \dots i_k}]_{n \times \dots \times n}$$$ of size $$$n^k$$$, such that

$$$ v = \sum\limits_{i_1,\dots,i_k} v_{i_1,\dots,i_k} (e_{i_1} \otimes \dots \otimes e_{i_k}). $$$

With this definition, the $$$k$$$-th symmetric power of $$$V$$$ is a subspace of $$$V^{\otimes k}$$$ formed of all vectors $$$v$$$ whose coordinates wouldn't change if we rearrange vectors $$$e_1,\dots,e_n$$$ in any way. We can also define its basis as

$$$ \left\{e_\mathbf i = \sum\limits_{(j_1,\dots,j_k)} (e_{j_1} \otimes \dots \otimes e_{j_k}) : 1 \leq i_1 \leq \dots \leq i_k \leq n\right\}, $$$

where $$$(j_1,\dots,j_k)$$$ go over all possible rearrangements of $$$(i_1,\dots,i_k)$$$. Looks familiar, doesn't it?

Now, what's even more important is that we can also define the symmetric power $$$\operatorname{S}^k(A)$$$ of any linear map $$$A : V \to V$$$ as the reduction of $$$A^{\otimes k}$$$ on $$$\operatorname{S}^k(V)$$$. Now, we can find its trace by going over all basis vectors in $$$\operatorname{S}^k(V)$$$ and summing up their contribution to the trace:

$$$ \operatorname{tr}\operatorname{S}^k(\mathbf B) = \sum\limits_{\mathbf i} \frac{e_\mathbf i \cdot \operatorname{S}^k(\mathbf B) e_\mathbf i}{e_\mathbf i \cdot e_\mathbf i}. $$$

Noting that $$$(v_1 \otimes \dots \otimes v_k) \cdot (w_1 \otimes \dots \otimes w_k) = (v_1 \cdot w_1) \dots (v_k \cdot w_k)$$$, we can see that $$$e_\mathbf i \cdot e_\mathbf i$$$ is the total number of rearrangements $$$(j_1,\dots,j_k)$$$ of $$$(i_1,\dots,i_k)$$$. This is because, when expanded, each rearrangement will only give $$$1$$$ with itself, while giving $$$0$$$ with any other. Conversely, $$$e_\mathbf i \cdot \operatorname{S}^k(\mathbf B) e_\mathbf i$$$ expands into summation over rearrangements $$$(l_1,\dots,l_k)$$$ and $$$(j_1,\dots,j_k)$$$ of $$$(i_1,\dots,i_k)$$$:

$$$ \sum\limits_{(l_1,\dots,l_k)} \sum\limits_{(j_1,\dots,j_k)} \prod\limits_{s=1}^k b_{l_s j_s}. $$$

When we divide this whole sum by $$$e_\mathbf i \cdot e_\mathbf i$$$, it is essentially equivalent to fixing $$$(l_1,\dots,l_k) = (i_1,\dots,i_k)$$$, making the sum into

$$$ \operatorname{tr}\operatorname{S}^k(\mathbf B) = \sum\limits_\mathbf i \sum\limits_{(j_1,\dots,j_k)} \prod\limits_{s=1}^k b_{i_s j_s} = \sum\limits_{|\mathbf k| = k} \frac{\operatorname{per}\mathbf B^{(\mathbf k,\mathbf k)}}{\mathbf k!}, $$$

which proves that the LHS of the MMT is the sum of $$$\operatorname{tr}\operatorname{S}^k(\mathbf B)$$$ over $$$k \geq 0$$$.

Inverse determinants as traces of symmetric powers

Now, let's take a closer look at $$$\det(\mathbf I - \mathbf B)^{-1}$$$. Let $$$\lambda_1,\dots,\lambda_n$$$ be eigenvalues of $$$\mathbf B$$$, then

$$$ \det(\mathbf I - \mathbf B)^{-1} = \prod\limits_{i=1}^n \frac{1}{1-\lambda_i} = \prod\limits_{i=1}^n \sum\limits_{k=0}^\infty \lambda_i^k = \sum\limits_{\mathbf k \geq 0} \lambda^\mathbf k. $$$

We may show that the eigenvalues of $$$\operatorname{S}^k(\mathbf B)$$$ are, in fact, all possible values of $$$\lambda^\mathbf k$$$ where $$$|\mathbf k|= k$$$. To show this, assume in the basis construction above that $$$e_1,\dots,e_n$$$ are all eigenvectors of $$$\mathbf B$$$ that also form the basis. In this case, $$$e_\mathbf i$$$ would also be an eigenvector of $$$\operatorname{S}^k(\mathbf B)$$$ with the eigenvalue $$$\lambda^\mathbf k$$$, where $$$\mathbf k = (k_1,\dots,k_n)$$$ and $$$k_i$$$ is the multiplicity of $$$i$$$ in $$$(i_1,\dots,i_k)$$$. On the other hand, the sum of all eigenvalues of a particular linear map is simply the trace of the linear map, hence

$$$ \sum\limits_{k=0}^\infty \operatorname{tr}\operatorname{S}^k(\mathbf B) = \det(\mathbf I - \mathbf B)^{-1}, $$$

concluding the proof of MacMahon's master theorem.

Applications to Quantum Physics

Consider the expression $$$\det(\mathbf I - \mathbf B)$$$ without taking its inverse:

$$$ \det(\mathbf I - \mathbf B) = \prod\limits_{i=1}^n (1-\lambda_i) = \sum\limits_{k=0}^\infty (-1)^k \sum\limits_{|\mathbf i|=k} \lambda^\mathbf i, $$$

where this time $$$\mathbf i$$$ iterates over tuples $$$(i_1,\dots,i_k)$$$ such that $$$1 \leq i_1 < \dots < i_k \leq n$$$ instead of $$$1 \leq i_1 \leq \dots \leq i_k \leq n$$$, thus ensuring that only pairwise distinct eigenvalues participate. Summed this way, eigenvalues also sum up to a trace of a certain power of $$$\mathbf B$$$. However, this time it's not symmetric power of $$$\mathbf B$$$, but instead exterior power, denoted $$$\Lambda^k(\mathbf B)$$$.

While symmetric power is a subspace of $$$V^{\otimes k}$$$ composed by symmetric tensors, exterior power is composed of antisymmetric tensors instead, so in its basis construction some rearrangements are multiplied by $$$-1$$$, if the parity of the rearrangement is odd. In this way,

$$$ \det(\mathbf I - \mathbf B) = \sum\limits_{k=0}^n (-1)^k \operatorname{tr} \Lambda^k(\mathbf B). $$$

Now, plugging $$$t \mathbf A$$$ instead of $$$\mathbf B$$$ into MMT, we get the following result:

$$$ \boxed{\frac{1}{\sum\limits_{k=0}^n (-1)^k \operatorname{tr} \Lambda^k(\mathbf A) t^k} = \sum\limits_{k=0}^\infty \operatorname{tr} \operatorname{S}^k(\mathbf A) t^k} $$$

which is also known as the boson-fermion correspondence, where bosons are commuting particles corresponding to symmetric powers, while fermions are skew-commuting particles corresponding to exterior powers.

I suppose one other way to look at it is to treat $$$\operatorname{tr}\operatorname{S}^k(\mathbf A)$$$ as the complete homogeneous symmetric polynomial $$$h_k(\lambda_1,\dots,\lambda_n)$$$, and $$$\operatorname{tr}\Lambda^k(\mathbf A)$$$ as the elementary symmetric polynomial $$$e_k(\lambda_1,\dots,\lambda_n)$$$, which then implies the correspondence between them.

Multivariate Lagrange inversion

MMT is, in fact, a linear case of a more general result, known as multivariate Lagrange implicit function theorem (LIFT).

Multivariate LIFT: Let $$$\mathbf x = (x_1,\dots,x_n)$$$ and let $$$R_1(\mathbf x), \dots, R_n(\mathbf x) \in \mathbf K[ [ \mathbf x ] ] $$$ be such that

$$$ R_j = x_j G_j(R_1,\dots,R_n), $$$

where $$$G_1(\mathbf u), \dots, G_n(\mathbf u) \in \mathbb K[ [\mathbf u] ]$$$ for $$$\mathbf u = (u_1,\dots,u_n)$$$. Then, for any $$$F(\mathbf u) \in \mathbb K[ [\mathbf u ] ]$$$ and $$$\mathbf a = (a_1,\dots,a_n)$$$ it holds that

$$$ \boxed{[\mathbf x^\mathbf a] F(R_1,\dots,R_n) = [\mathbf u^\mathbf a] F(\mathbf u) G^\mathbf a(\mathbf u) \det\left(\delta_{ij} - \frac{u_j}{G_i(\mathbf u)} \frac{\partial G_i(\mathbf u)}{\partial u_j}\right)} $$$

where the expression under the determinant is the $$$(i,j)$$$-th element of the matrix. For $$$G_i(\mathbf u) = \sum\limits_{j=1}^n a_{ij} u_j$$$ we get MMT.

I previously made a blog post about univariate LIFT, but this one here seems significantly harder to comprehend to me.

P.S. One more interesting related identity is (as found on Mathematics StackExchange):

$$$ \boxed{\sum\limits_{\mathbf k, \mathbf l \geq 0} \operatorname{per} \mathbf A^{(\mathbf k, \mathbf l)} \frac{\mathbf x^\mathbf k \mathbf y^\mathbf l}{\mathbf k! \mathbf l!} = e^{\mathbf x^\top \mathbf A \mathbf y}} $$$

Meaning that MMT essentially extracts diagonal elements from this bivariate genfunc. Nice to know, I guess?