Arrow product: How to enumerate directed graphs

#	User	Rating
1	tourist	3880
2	jiangly	3669
3	ecnerwala	3654
4	Benq	3627
5	orzdevinwang	3612
6	Geothermal	3569
6	cnnfls_csy	3569
8	jqdai0815	3532
9	Radewoosh	3522
10	gyh20	3447

#	User	Contrib.
1	awoo	161
2	maomao90	160
3	adamant	157
4	maroonrk	154
5	-is-this-fft-	148
5	atcoder_official	148
5	SecondThread	148
8	Petr	147
9	TheScrasse	145
10	nor	144

Hi everyone!

In this blog, I would like to present a somewhat novel concept in competitive programming, which is, while being relatively simple, allows us to enumerate directed graphs with specific type of strongly connected components. This allows us to easily and uniformly compute generating functions e.g. for acyclic digraphs or strongly connected digraphs. The formalism introduced here is also very intuitive at that.

I initially wanted to make a problem based on the content of this blog to appear in today's round, but after some discussions we decided to exclude it, as the contest already had a sufficient amount of difficult problems. You may try the original problem [problem:441297A] via the invitation link.

Prerequisites

You are expected to be familiar with exponential generating functions, and the exponential formula for connected structures, that is, that if $$$A(x)$$$ is the EGF for combinatorial structures $$$A$$$, then $$$\exp A(x)$$$ is the EGF corresponding to sets of isolated instances of $$$A$$$.

Directed acyclic graphs

To warm-up, consider the following problem:

You're given an integer $$$n$$$. You need to count the number of distinct directed acyclic graphs on $$$n$$$ vertices.

In other words, the problem asks you to find the $$$n$$$-th entry of the sequence A003024. One direct and somewhat well-known way to solve this problem is to use the inclusion-exclusion principle based on the number of the source vertices of the graph:

$$$ \boxed{a_n = \sum\limits_{i=1}^n (-1)^{i+1} \binom{n}{i} 2^{i(n-i)} a_{n-i}} $$$

In the formula above, we choose $$$i$$$ out of $$$n$$$ vertices to be the source vertices in $$$\binom{n}{i}$$$ ways, and then we choose how we draw the arcs from the source vertices to all the other in $$$2^{i(n-i)}$$$ ways, and we build a DAG on the remaining $$$n-i$$$ vertices. To avoid double-counting, we do inclusion-exclusion on the number of source vertices $$$i$$$.

Is this fft?

The formula above, used as it is, allows to compute $$$a_n$$$ in $$$O(n^2)$$$. But what if we want to do it faster? Let's rewrite it as

$$$ \frac{a_n}{n!} = \sum\limits_{i=1}^n 2^{i(n-i)} \frac{(-1)^{i+1}}{i!}\frac{a_{n-i}}{(n-i)!} $$$

It almost looks like a convolution, except for the nasty $$$2^{i(n-i)}$$$ part. Luckily, we have a trick up our sleeve to deal with it! Recall the following formula from the chirp Z-transform that allows to rewrite $$$ab$$$ in terms of $$$a+b$$$, $$$a$$$ and $$$b$$$:

$$$ ab = \binom{a+b}{2} - \binom{a}{2} - \binom{b}{2}. $$$

Applying this formula, we get

$$$ \large \frac{a_n}{n! 2^{\left(^n_2\right)}} = \sum\limits_{i=1}^n \frac{(-1)^{i+1}}{i!2^{\left(^{\,i}_2\right)}} \frac{a_{n-i}}{(n-i)! 2^{\left(^{n-i}_{~~2}\right)}}, $$$

and moving the sum to the right, we may write it in a bit more uniform way as

$$$ \large \sum\limits_{i=0}^n \frac{(-1)^{i+1}}{i!2^{\left(^{\,i}_2\right)}} \frac{a_{n-i}}{(n-i)! 2^{\left(^{n-i}_{~~2}\right)}} = \begin{cases} 1, & n = 0, \\0, & n \neq 0. \end{cases} $$$

In terms of generating functions, it rewrites in a much simpler form as

$$$ L(e^{-x}) \cdot L(D(x)) = 1, $$$

where

$$$ D(x) = \sum\limits_{n=0}^\infty a_n \frac{x^n}{n!} $$$

is the exponential generating function of the acyclic digraphs, and $$$L(\cdot)$$$ is a function defined on formal power series, which divides the $$$k$$$-th coefficient of the formal power series by $$$2^{\binom{k}{2}}$$$. In other words, we get the formula

$$$ \boxed{L(D(x)) = \frac{1}{L(e^{-x})}} $$$

and first $$$n$$$ coefficients of the right-hand side can be computed in $$$O(n \log n)$$$ with power series inversion.

Arrow product

Let's stop for a moment and think a bit more generically about what just happened. We found out that the product

$$$ L(A(x)) \cdot L(B(x)) $$$

corresponds to a special kind of convolution defined as

$$$ \boxed{c_k = \sum\limits_{i+j=k} 2^{ij} a_i b_j} $$$

The formula above is very similar to general convolution as sum of $$$a_i b_j$$$, which is known to enumerate the pairs of an element described by the generating function $$$A$$$ and an element described by the generating function $$$B$$$. And the additional $$$2^{ij}$$$ multiplier means that, in addition to picking an element of type $$$A$$$ on $$$i$$$ "atoms" and an element of type $$$B$$$ on $$$j$$$ atoms, we connect each pair of atoms of these elements in one of $$$2$$$ ways. This can be interpreted e.g as follows:

Element of the arrow product of digraphs (first interpretation)
Illustration from the paper Symbolic method and directed graph enumeration

We either draw, or do not draw a (directed) edge from the vertices of an object of type $$$A$$$ to the vertices of an object of type $$$B$$$;
We either draw, or do not draw an (undirected) edge between the vertices of an object of type $$$A$$$ and an object of type $$$B$$$;
We always draw a (directed) edge between each pair of vertices, and choose a direction in one of $$$2$$$ ways;
We always draw an (undirected) edge between each pair of vertices, and color it in one of $$$2$$$ colors.

A combinatorial structure built on top of structures of types $$$A$$$ and $$$B$$$ in accordance to one of the above interpretations is called the arrow product of the structures $$$A$$$ and $$$B$$$. This interpretation allows to think of the task above from a different perspective.

As a notational convenience, we will use the symbol $$$\curvearrowright$$$ for such product, so that

$$$ \boxed{A(x) \curvearrowright B(x) = \sum\limits_{k=0}^\infty x^k \sum\limits_{i+j=k}2^{ij} a_i b_j}, $$$

where $$$a_i = [x^i] A(x)$$$ and $$$b_j = [x^j] B(x)$$$.

Enumerating DAGs

Let's mark the sources of a digraph with special variable $$$u$$$. In other words, let $$$D(x, u)$$$ be a generating function such that the coefficient near $$$x^n u^k$$$ corresponds to the number $$$a_{n,k}$$$ of DAGs with $$$n$$$ vertices and $$$k$$$ source vertices, divided by $$$n!$$$ (so it is EGF in terms of the number of vertices, as they're considered pairwise distinct, and OGF in terms of the number of sources, as they're considered indistinguishable). With definition above in mind, we can write the following equation for $$$D(x, u)$$$:

$$$ \boxed{D(x, u+1) = e^{xu} \curvearrowright D(x)} $$$

In the formula above, $$$u+1$$$ after expansion can be interpreted in such way that now the coefficient near $$$x^n u^k$$$ corresponds to the number of DAGs that have $$$n$$$ vertices overall and $$$k$$$ of their sources are specifically marked. In other way, $$$(u+1)^t$$$ expands in a way that expanding it into $$$u$$$ corresponds to marking the vertex, while expanding it into $$$1$$$ corresponds to leaving it as is.

Then, we can represent the class of objects enumerated by $$$D(x, u+1)$$$ as the arrow product of a set of marked source vertices, corresponding to the generating function $$$e^{ux}$$$, and of a directed graph without any marked vertices, corresponding to $$$D(x)$$$. Please refer to the diagram below for better understanding, the vertices on the left part will become marked source vertices and enumerated by $$$u$$$, while the source vertices on the right part will remain unmarked.

$$$D(x,u+1)$$$ represented as an arrow product
Illustration from the paper Symbolic method and directed graph enumeration

Now, using the definition of $$$D(x, u)$$$ we should note that $$$D(x, 1) = D(x)$$$ and $$$D(x, 0) = 1$$$. Thus, substituting $$$u=-1$$$, we get

$$$ \boxed{e^{-x} \curvearrowright D(x) = 1} $$$

which is equivalent to the already known formula

$$$ L(D(x)) = \frac{1}{L(e^{-x})}. $$$

Enumerating digraphs with given type of SCCs

Now, thinking about this problem a bit more generically, we may thing of DAGs as directed graphs such that each of their strongly connected components is a single vertex. So, the class of all possible strongly connected components has an EGF $$$A(x) = x$$$. But the same reasoning applies to any other kind of digraphs with specified type of strongly connected components.

For examples we can use $$$A(x) = \log \frac{1}{1-x}$$$ for graphs whose SCCs are simple cycles, or we can put $$$A(x)$$$ to be an EGF of strongly connected graphs, to obtain EGF for all possible digraphs. Let $$$G(x, u)$$$ be generating function for digraphs having only strongly connected components enumerated by the EGF $$$A(x)$$$, such that the coefficient near $$$x^n u^k$$$ corresponds to the number of digraphs of type $$$G$$$ such that they have $$$n$$$ vertices and $$$k$$$ "source" strongly connected components (i.e. components without incoming edges).

$$$G(x,u+1)$$$ represented as the arrow product of $$$e^{A(x) u}$$$ and $$$G(x)$$$
Illustration from the paper Symbolic method and directed graph enumeration

Then, with the same logic, we can write that

$$$ \boxed{G(x, u+1) = e^{A(x) u} \curvearrowright G(x)} $$$

from which one may again use the substitution $$$u=-1$$$, and get

$$$ \boxed{e^{-A(x)} \curvearrowright G(x) = 1} $$$

Which leads us to the general formula

$$$ L(G(x)) = \frac{1}{L(e^{-A(x)})}. $$$

Finding EGF for strongly connected digraphs

The formula above can also be used to find $$$A(x)$$$ when $$$G(x)$$$ is known. If we use $$$A(x) = S(x)$$$, where $$$S(x)$$$ is the EGF of all strongly connected digraphs, then $$$G(x)$$$ would be the EGF of all possible digraphs. On the other hand, it is known that the number of digraphs on $$$n$$$ vertices is known to be $$$2^{n(n-1)}$$$ (for each of $$$n(n-1)$$$ ordered pairs we either draw a directed edge or not). In terms of $$$L(\cdot)$$$ it means

$$$ G(x) = L^{-1}(L^{-1}(e^x)), $$$

from which one gets that

$$$ L(e^{-S(x)}) = \frac{1}{L^{-1}(e^x)}, $$$

which in turn rewrites as

$$$ \boxed{S(x) = - \log L^{-1} \left(\frac{1}{L^{-1}(e^x)}\right)} $$$

which can also be computed in $$$O(n \log n)$$$ with formal power series inversion and logarithms.

Enumerating with edges

Now, what if we want a generating function $$$A(x, w)$$$ such that a coefficient near $$$x^n w^m$$$ corresponds to the number of digraphs of type $$$A$$$ that have $$$n$$$ vertices and $$$m$$$ edges? Again, we would divide this coefficient by $$$n!$$$, but not $$$m!$$$, as we generally consider vertices distinct (enumerated/labeled), while edges indistinguishable from each other. Proper way to define an arrow product in this case would be

$$$ \boxed{A(x, w) \curvearrowright_w B(x, w) = \sum\limits_{k=0}^\infty x^k \sum\limits_{i+j=k} (1+w)^{ij} a_i(w) b_j(w)} $$$

where $$$a_i(w) = [x^i]A(x, w)$$$ and $$$b_j(w) = [x^j] B(x, w)$$$. Here, $$$1+w$$$ semantically means that we decide to either draw (or refrain from it) an edge, which is either undirected, or is always directed from left to right (depending on the context). Then, expanding into $$$w$$$ means drawing an edge, while expanding into $$$1$$$ would mean not doing it.

We can also define $$$L_w(A(x, w))$$$ to be a function that divides $$$[x^k] A(x, w)$$$ by $$$(1+w)^{\binom{k}{2}}$$$, in which case we will get the familiar identity

$$$ L_w(A(x, w)) \cdot L_w(B(x, w)) = L_w(A(x, w) \curvearrowright_w B(x, w)). $$$

In this notion, the formula for SCCs still holds:

$$$ \boxed{e^{-A(x, w)} \curvearrowright_w G(x, w) = 1} $$$

assuming that $$$G(x, w)$$$ enumerates graphs whose strongly connected components only belong to $$$A(x, w)$$$.

Enumerating directed cacti for free

Consider the problem 104234L - Directed Vertex Cacti. It essentially asks us to find $$$[x^n w^m] G(x, w)$$$ of the generating function of graphs $$$G$$$ such that each of their strongly connected components is a simple cycle, and $$$w$$$ only enumerates edges outside the SCCs. What would be a generating function $$$A(x)$$$ for such SCCs? Nothing else than

$$$ \log\frac{1}{1-x}, $$$

as it is a well-known generating function for simple cycles. Then, using the formula above, we get

$$$ (1-x) \curvearrowright_w G(x, w) = 1, $$$

hence

$$$ L_w(G(x, w)) = \frac{1}{1-x} = \sum\limits_{n=0}^\infty x^n, $$$

where $$$L_w(\cdot)$$$ divides $$$[x^k]G(x, w)$$$ by $$$(1+w)^{\binom{k}{2}}$$$. Therefore, we get

$$$ G(x, w) = \sum\limits_{n=0}^\infty (1+w)^{\binom{n}{2}} x^n, $$$

meaning that $$$[x^n y^m] G(x, w) = \binom{\binom{n}{2}}{m}$$$.

Enumerating edge cacti

Looking for a challenge? I have some! Use this invitation link, and you can try solving [problem:441297A], much harder version of the problem above, where cactus is a graph where "each edge belongs to at most one cycle" rather than each vertex. As far as I know, there are no closed form solution, so you would really need to use some generatingfunctionology to solve it.

Comments (10)

Write comment?

early-morning-dreams

15 months ago, # |

+286

nice blog you got me at the prerequisites so i didnt have to read further

→ Reply

ftiasch

+39

wow it looks coooooool. thank you adamant.

adamant

15 months ago, # ^ |

← Rev. 3 →

Nice to know there is still somebody reading my blogs ;-;

You may try [problem:441297A] (goto invitation link to access) if you're looking for some applications of this :)

← Rev. 2 →

There are at most $$$10$$$ people who can understand all definitions in your articles, others either haven't yet studied math at a university or passed all the math courses by cheating (like me) because math is useless irl

NeverCompromise

-10

Comment has been deleted by the NSA.

Who is queen eva

stefanbalaz2

6 months ago, # ^ |

these things are learned on universities?

Dominater069

+18

nice blog, thanks for helping the community.

Electric-tric

+68

As a historical sidenote, it is amazing that techniques pretty much identical to the arrow product have been discovered already around 1970s by people working quasi-independently of each other (Liskovets, Robinson, Stanley, Gessel, Wright, ...), but then much got lost in the literature. For instance, Liskovets was publishing around 1969s in BSSR (Belorussian Soviet Republic) journals and very short notes (around 4 pages), with very little details. They were able to handle unlabelled cases with some more advanced techniques as well! Robinson published one of his main works, namely, how recurrences can be represented in form of GFs around 1970 paper in some conference proceedings which is sadly still not available online. So sadly much of these gems have been rediscovered only in 2019s, and were not so widespread in the combinatorial community (e.g. how to enumerate directed acyclic graphs and strongly connected digraphs). I am going to put online some of Liskovets' rare works that he kindly presented to me a couple of years ago https://sergey-dovgal.vercel.app/early-works-of-valery-a-liskovets, as well as 1970 Robinson's paper, although it will take some time until this project is completed.