This blog will be more about developing ideas about certain structures rather than actually coding up stuff. The ideas we'll be developing here have many applications, some of which involve:

- Number Theoretic Möbius Inversion
- Principle of Inclusion-Exclusion
- Directed Acylic Graphs
- Subset Sum Convolution
- Finite Difference Calculus (and prefix sums)

Note that some of the terminology used below might be non-standard. Also, some parts of the blog make a reference to abstract algebra without much explanation; such parts are just for people who know some algebra, and can safely be skipped by those who don't know those concepts at all.

Prerequisites:

- Some knowledge about graphs.
- Matrix multiplication
- Basic combinatorics

## Definition of a poset and some intuition

All of these have one thing in common: they have an underlying "poset". Let's understand what a poset is.

Rigorously speaking, a poset is a partially ordered set, i.e., a set $$$S$$$ equipped with a relation $$$\le$$$ that is reflexive, antisymmetric, and transitive.

More precisely,

- $$$\le$$$ is reflexive means that $$$a \le a$$$ for all $$$a$$$ in $$$S$$$.
- $$$\le$$$ is antisymmetric means that if $$$a \le b$$$ and $$$b \le a$$$, then $$$a = b$$$
- $$$\le$$$ is transitive means that if $$$a \le b$$$ and $$$b \le c$$$, then $$$a \le c$$$

To get some intuition about posets, it's helpful to think of them in terms of directed acyclic graphs (DAGs), equipped with the relation of reachability; i.e., vertices $$$a$$$ is related to $$$b$$$ iff there is a path from $$$a$$$ to $$$b$$$. In fact, there is always at least one DAG corresponding to every poset that is equivalent to it in terms of semantics.

To make this notion a bit clearer, we construct a "DAG" (we allow self-loops, so it is not precisely a DAG, but we can get rid of these self-loops) as follows: construct a vertex for each element of the set, and there is an edge from $$$u$$$ to $$$v$$$ iff $$$u \le v$$$. For the lack of a better word, we shall call this the "relational DAG" of the poset. Note that this notation is not standard.

Remark: By the properties of a poset, this "DAG" has the property that if there is a path from $$$u$$$ to $$$v$$$, there is an edge between $$$u$$$ and $$$v$$$ (i.e., reachability and adjacency are equivalent in this graph). In other words, this graph is its own reflexive transitive closure. Also, a topological sort of this "DAG" (ignoring the self-loops) is called a linear extension of the poset (flattening out the "DAG" induces a natural linear order on the vertices, which are just the elements of the poset).

It is usually easier to look at this graph after removing all redundancies (i.e., if $$$u \le v$$$ and $$$v \le w$$$, then remove the edge $$$(u, w)$$$ from the graph if it exists). The resulting graph is called the Hasse diagram of the poset, and is super useful for reasoning about posets.

Note that it is NOT necessary that for any two elements $$$(u, v)$$$, one of $$$u \le v$$$ and $$$v \le u$$$ must hold. For instance, consider the poset given by the set $$$S = \{a_1, a_2, a_3, a_4\}$$$ and the relation $$$\le$$$ given by $$$\{(a, a) \mid a \in S\} \cup \{(a_1, a_2), (a_1, a_3), (a_1, a_4), (a_2, a_4), (a_3, a_4)\}$$$. This satisfies the constraints that $$$\le$$$ must be reflexive, antisymmetric and transitive; however, neither of $$$a_2 \le a_3$$$ and $$$a_3 \le a_2$$$ hold.

For the remainder of this blog, it will help to think of $$$\le$$$ in terms of the Hasse diagram of the poset.

Note: when we say $$$a < b$$$, we mean that $$$a \le b$$$ and $$$a \ne b$$$.

## Some examples of posets

- Either of the sets $$$\mathbb{R}, \mathbb{N}, \mathbb{Z}, \mathbb{Q}$$$ equipped with the usual $$$\le$$$ is a poset.
- For any set $$$A$$$, the set of its subsets, equipped with the relation $$$\subseteq$$$ (or $$$\supseteq$$$) is a poset.
- The set $$$\mathbb{N}$$$ equipped with the relation $$$\mid$$$ (divisibility) is a poset.

## The incidence algebra of a poset

Note for the pedantic reader: In what follows, we only look at locally finite posets (i.e., posets in which the set of $$$z$$$ such that $$$x \le z \le y$$$ is finite for all $$$x, y$$$). In particular, we can use the following ideas with infinite DAGs too, though it might be possible that infinitely large (but "locally sparse") matrices may not make sense.

In particular, the following doesn't apply to the posets $$$(\mathbb{R}, \le), (\mathbb{Q}, \le)$$$ and so on.

Let $$$(S, \le)$$$ be some poset. Consider the adjacency matrix $$$M$$$ of its relational DAG. It has the property that if $$$x \not\leq y$$$, then the entry at $$$(x, y)$$$ is $$$0$$$, and otherwise it is $$$1$$$. We can also interpret this matrix $$$M$$$ as a function from $$$S \times S \to \{0, 1\}$$$.

We generalize this notion a bit. Rather than having $$$M(x, y) = 1$$$ whenever $$$x \le y$$$, we assign arbitrary complex numbers to $$$M(x, y)$$$ (which is equivalent to assigning arbitrary edge weights (possibly $$$0$$$) to edges in the relational DAG).

This leads to a set of functions $$$\alpha : S \times S \to \mathbb{C}$$$, with the property that $$$x \not\leq y \implies \alpha(x, y) = 0$$$. We call this the "incidence algebra" of the poset (for those who know what an algebra is, we'll see why this is an algebra). We won't distinguish between these functions and their corresponding matrices in what follows.

The main idea behind the whole approach is to look at these functions as matrices and extract useful information using matrix operations.

To define the "product" on this set of functions, we can simply use matrix multiplication to define its semantics. That is, for functions $$$\alpha, \beta$$$ in the incidence algebra, we define the product as $$$(\alpha\beta)(x, y) = \sum_{z \in S} \alpha(x, z) \beta(z, y)$$$, which is some kind of convolution.

We can see that the incidence algebra is closed under product (intuitively, think of it as aggregating some combination of edge weights over paths of length 2; if there is no path from $$$x$$$ to $$$y$$$, the set of such paths is empty anyway, so the answer is still 0, the identity of the aggregation function which is $$$+$$$).

Similarly, we can define scalar multiplication and addition in terms of matrix operations, and see that the incidence algebra is closed under these operations too.

Important remark: Note that if $$$f$$$ is a function from $$$S$$$ to $$$\mathbb{C}$$$, then a vector with $$$f(x)$$$ at the position corresponding to $$$x$$$ also satisfies the usual matrix multiplication identities with the elements of the incidence algebra, with the semantics of multiplication with a column vector being analogous to aggregating over paths that end at a given vertex (and for row vector left-multiplication, paths that end at a given vertex).

Optional remark: For those who know what an algebra is, matrix multiplication is a bilinear operator so product is the bilinear operator in the incidence algebra. A further generalization would be to define a different underlying field than $$$\mathbb{C}$$$ and another bilinear operator, but we won't look into this further.

## Some special members of the incidence algebra

Okay, now that we have defined this set and some operations on it, let's see some examples of members of this set, what information we can get out of these operations, and what kind of operations are "nice" in this context.

Firstly, the most obvious choice of a matrix is $$$O$$$, the zero matrix (which is clearly a member of the incidence algebra). This doesn't really give us any information though. It still has a role as the additive identity for addition in this algebra.

The second most obvious choice is $$$I$$$, the identity matrix. Note that this is a member of the incidence algebra due to reflexivity of $$$\le$$$. This is a multiplicative identity, and that's something we will need for Möbius inversion.

Thirdly, let's consider the adjacency matrix $$$M$$$ of the relational DAG. Let the corresponding function be $$$\zeta$$$. Clearly, this is also in the incidence algebra. We have the property that $$$\zeta(x, y) = 1$$$ if $$$x \le y$$$, and $$$0$$$ otherwise.

Spoiler alert: the inverse of this matrix is the Möbius function for the poset and corresponds to things like the number theoretic Möbius function in the divisibility poset and so on.

## The Möbius function

Finally, we define the Möbius function $$$\mu$$$ as follows (It might not feel intuitive at first, but bear with me for a while):

For $$$x \not\leq y$$$, $$$\mu(x, y) = 0$$$, $$$\mu(x, x) = 1 \, \forall x \in S$$$, and inductively define

Note that $$$\mu(x, y)$$$ is always an integer. Also, note that $$$\sum_{x \le z \le y} \mu(x, z) = 1$$$ if $$$x = y$$$ and $$$0$$$ otherwise (in the case $$$x < y$$$, it follows from the identity, in the case when $$$x, y$$$ are not comparable, the sum is empty, and in the case when $$$x = y$$$, it consists of only one term equal to $$$1$$$).

Hence we have for all $$$x, y \in S$$$,

However, the expression on the left is just $$$(\mu \zeta)(x, y)$$$. Since this holds for all $$$x, y$$$, we have $$$\mu\zeta = I$$$.

As we mentioned earlier, this means that $$$\mu$$$ is the inverse of $$$\zeta$$$, so we must also have $$$\zeta\mu = I$$$ (and expanding this gives $$$\sum_{x \le z \le y} \mu(z, y)$$$ is $$$1$$$ if $$$x = y$$$ and $$$0$$$ otherwise).

## The Möbius inversion formula

The Möbius inversion formula in this setting just reduces to a simple identity corresponding to the multiplication of a matrix and a vector.

Formally, the Möbius inversion formula states that if $$$f, g$$$ are functions from $$$S$$$ to $$$\mathbb{C}$$$ such that $$$g(x) = \sum_{y \le x} f(y)$$$, then we have $$$f(x) = \sum_{y \le x} \mu(y, x) g(y)$$$. The converse also holds true.

For a proof, consider a vector $$$F$$$ with the element at position corresponding to $$$x$$$ being $$$f(x)$$$. Define $$$G$$$ similarly.

Now the condition merely means that $$$G = \zeta^\intercal F$$$. Left-multiplying this by $$$\mu^\intercal$$$, we get $$$\mu^\intercal G = \mu^\intercal \zeta^\intercal F = IF = F$$$. Expanding this gives the Möbius inversion formula. The converse can be proved by reversing all these steps, since $$$\mu^\intercal$$$ is invertible.

Note that this formula is just a trivial consequence of some matrix multiplication properties; so there's a lot more to the theory we have developed above than just this formula. However, we shall limit ourselves to only the applications of this formula to get to some interesting results.

## Some applications

### The generalized principle of inclusion-exclusion

Let's first talk about the contexts in which we use generalized inclusion-exclusion. We usually want to compute some function $$$f$$$ of a finite set $$$A$$$, but it is much easier to compute the sum of $$$f$$$ over all subsets of $$$A$$$.

That is, it is easy to compute $$$g(A) = \sum_{B \subseteq A} f(B)$$$, but hard to compute $$$f(A)$$$ itself.

The generalized principle of inclusion-exclusion states that $$$f(A) = \sum_{B \subseteq A} (-1)^{|A| - |B|} g(B)$$$.

For a proof, let's look at the poset corresponding to the power set of a finite set $$$F$$$ such that $$$A \subseteq F$$$, equipped with the $$$\subseteq$$$ relation. Clearly, using the Möbius inversion formula, we have $$$f(A) = \sum_{B \subseteq A} \mu(B, A) g(B)$$$.

We claim that $$$\mu(B, A) = (-1)^{|A| - |B|}$$$ for $$$B \subseteq A$$$. We'll do this by fixing $$$B$$$ and doing an induction on $$$|A|$$$.

For $$$|A| = |B|$$$, this is clear. Now suppose this is true for all $$$|A| < k$$$. Consider the case when $$$|A| = k$$$. We have (from the definition of the Möbius function) that

where the last equality follows from the binomial theorem, and the proof is complete.

#### The usual principle of inclusion-exclusion as a special case

Now let's see how the usual inclusion-exclusion principle is just a special case of this generalized formula.

The inclusion-exclusion principle states that for sets $$$A_1, A_2, \ldots, A_n$$$, we have

For a proof, consider the poset corresponding to the power set of $$$[n] := \{1, \ldots, n\}$$$, equipped with $$$\supseteq$$$.

Also define $$$U = A_1 \cup \ldots \cup A_n$$$.

For a set $$$S \subseteq [n]$$$, define

$$$f(S) =$$$ number of elements in all $$$A_i$$$, where $$$i \in S$$$, but not in any other $$$A_j$$$-s.

$$$g(S) =$$$ number of elements in all $$$A_i$$$, where $$$i \in S$$$, and possibly in other $$$A_j$$$-s.

Then we have $$$g(S) = |\cap_{i \in S} A_i|$$$. Also, we have $$$g(S) = \sum_{T \supseteq S} f(T)$$$. Using the Möbius inversion formula, we get

which completes the proof.

### Subset sum convolution

I won't be explaining the ideas behind subset sum convolution in too much detail, but would refer you to this blog that covers the necessary material.

The poset in consideration is the power set of $$$[n]$$$, and its elements are in bijection with $$$n$$$ bit integers where the $$$i^\mathrm{th}$$$ bit is $$$1$$$ iff $$$i$$$ is in the set.

The transforms $$$z$$$, $$$\mu$$$ are the $$$\zeta$$$ and $$$\mu$$$ functions here (note that a two-variable function in the incidence algebra is just a map that transforms a one-variable function to another one-variable function, which can be thought of as the multiplication of the matrix corresponding to the two-variable function with the vector corresponding to the one-variable function), and $$$\sigma$$$ corresponds to a diagonal matrix with the element on the diagonal being 1 if the element of the poset has an even number of elements, and -1 otherwise.

The rest of that blog can conveniently be studied in terms of the incidence algebra for this poset.

### The number-theoretic Möbius function

For this, we will need to cover a bit more ground, since the divisibility poset has nicer properties compared to other posets (it is what is called a lattice).

#### What is a lattice

A lattice is a special kind of a poset, where for each pair of elements $$$(x, y)$$$, there exist elements $$$x \land y$$$ (called the meet of $$$x, y$$$) and $$$x \lor y$$$ (called the join of $$$x, y$$$) such that

- $$$z \le x$$$ and $$$z \le y$$$ $$$\implies$$$ $$$z \le x \land y$$$
- $$$x \land y \le x$$$ and $$$x \land y \le y$$$
- $$$x \le z$$$ and $$$y \le z$$$ $$$\implies$$$ $$$x \lor y \le z$$$
- $$$x \le x \lor y$$$ and $$$y \le x \lor y$$$

Intuitively, this basically means that each pair of elements in the Hasse diagram has a least common ancestor and highest common descendent. For instance, the power set poset and the divisibility poset are both lattices (exercise: what are the meet and the join functions for these lattices?).

Note that inductively, every finite subset of a lattice has a meet and a join. Also, due to antisymmetry of $$$\le$$$, the meet and the join are unique for each pair $$$x, y$$$. Hence for finite lattices $$$L$$$, there is a unique "maximum" (say $$$\top_L$$$) and a unique "minimum" (say $$$\bot_L$$$).

#### A theorem for lattices

Theorem: For a lattice $$$(L, \le)$$$, and $$$\bot_L < a \in L$$$, $$$\sum_{x \lor a = \top_L} \mu(\bot_L, x) = 0$$$.

Essentially, this theorem states that the sum of the Möbius function applied on all $$$\bot_L, x$$$ where the "LCA" of $$$a$$$ and $$$x$$$ is $$$\top_L$$$ is 0.

For a proof, we first note that because of the second Möbius identity (using the product of $$$\mu$$$ and $$$\zeta$$$), we have:

Using the definition of $$$\lor$$$ and rearranging the sums reduces the sum to

Using the first Möbius identity and noting that $$$y$$$ is not $$$\bot_L$$$, reduces this expression to $$$0$$$, as needed.

Now let's try to compute $$$\mu(a, b)$$$ for $$$a, b$$$ in the divisibility poset. Note that if $$$a$$$ doesn't divide $$$b$$$ or if $$$a > b$$$, this is just 0. So it suffices to consider the case where $$$b = ax$$$ for some natural number $$$x$$$.

Let's look at all positive integers $$$r$$$ such that $$$1 \mid r \mid x$$$. This forms a lattice $$$L$$$. Similarly, the set of positive integers $$$r'$$$ such that $$$a \mid r' \mid ax$$$ also forms a lattice, which is isomorphic to $$$L$$$. Also, since by the inductive definition, $$$\mu(a, ar)$$$ depends only on the elements $$$s$$$ such that $$$a \mid s \mid ar$$$, so we must have $$$\mu(a, ar) = \mu(1, r)$$$. So it suffices to compute $$$\mu(1, x)$$$ for all $$$x$$$.

We claim that for $$$x$$$ being square-free, $$$\mu(1, x)$$$ is $$$(-1)^k$$$ if $$$x$$$ is a product of $$$k$$$ distinct primes, and it is $$$0$$$ otherwise. To show this, we induct on $$$x$$$.

Note that $$$1 = \bot_L$$$ and $$$x = \top_L$$$. The claim is trivial for $$$x = 1$$$. Now suppose $$$p$$$ is a prime that divides $$$x$$$. Using the previously proved theorem, we get $$$\mu(1, x) = -\sum_{y \ne x, y \lor p = x} \mu(1, y)$$$.

If $$$p^2$$$ divides $$$b$$$, then the sum on the right is empty (as $$$\lor$$$ is just the LCM). Otherwise, the sum contains only one $$$y$$$, which equals $$$x / p$$$, so using the inductive hypothesis, we are done.

As a side-note, for "reasonable" functions that only depend on the ratio between their parameters, we can "compress" them into a function that takes a single parameter. The "product" in that case becomes Dirichlet convolution, which has tons of interesting properties. A relevant blog for that can be found here.

Also note that if we restrict ourselves to the divisibility posets of square-free numbers $$$n$$$, they are isomorphic to the power set poset of the set of prime divisors of $$$n$$$, so all power set posets are just special cases of divisibility posets!

### Finite Difference Calculus

In this case, the poset is particularly simple, and it equals $$$(\mathbb{N} \cup \{0\}, \le)$$$. The function $$$\mu$$$ takes a value of $$$1$$$ at $$$(n, n)$$$, $$$-1$$$ at $$$(n, n + 1)$$$, and $$$0$$$ everywhere else. "Convolution" with $$$\mu$$$ is equivalent to the difference operator, and using Möbius inversion, the prefix sum operator is that with $$$\zeta$$$.

## Final notes and remarks

Firstly, the content of this blog might not be super-useful in competitive programming, but the intent behind this was to introduce a very standard technique in combinatorics that people don't realize is super-flexible.

Secondly, it might not be easy to understand in a first reading, so it's encouraged to work out the math through the blog, and understand why and how these things are tied together. It will also help to make some Hasse diagrams of posets, and look at concrete examples of operations we described here.

Finally, there might be errors and typos in the blog, as well as better ways to approach the content of the blog. If you find any, please let me know in the comments!

One question I've always had about this kind of generalized Mobius inversion/PIE: what techniques are there to effectively find a closed form for the Mobius function of some nontrivial poset? You can find terms computationally using general formula, but this could be quadratic time or otherwise too slow for some reason. Are there any good general techniques? For instance, how can you find the Mobius function for the poset of partitions of $$$n$$$ labelled elements, compared by inclusion?

I've had this question for a while now as well. I usually tend to go for guessing and induction, but there is a technique that's quite useful when there is a nice substructure property.

Using these for the poset (actually lattice) of partitions, let $$$A$$$ and $$$B$$$ be two partitions, with $$$B$$$ being a refinement of $$$A$$$. Then it suffices to solve for the case when $$$A$$$ is the trivial partition (we can take the products for the Mobius function corresponding to each block that $$$A$$$ partitions the set into). Then using a simple induction, if there are $$$n$$$ blocks in $$$B$$$, the Mobius function in the case when $$$A$$$ is the trivial partition is $$$(-1)^{n - 1}(n - 1)!$$$.

There are some more results here but I'm not sure which ones make computing the Mobius function computationally feasible in the general case.

By the way, in this special case, there's an interesting view of this closed form: we're doing PIE on permutations which do not alter the partition. When doing PIE on permutations, each permutation should just be weighted by its sign. $$$(-1)^{n-1}$$$ is precisely the sign of an $$$n$$$-cycle, and $$$(n-1)!$$$ is precisely the number of possible $$$n$$$-cycles.

I was trying to understand the argument, but didn't understand what you meant by "doing PIE on permutations which do not alter the partition". How do you relate this to the Mobius function in this case, and what exactly does PIE on permutations mean? Thanks in advance!

Consider the problem of trying to count the number of sequences of $$$n$$$ distinct elements satisfying some condition, using PIE to cancel out sequences where some particular subsets of elements (a partition) are equal; the weights in this are precisely the Mobius function.

Then, we can count this using PIE on permutations which fix the sequence. For any sequence with possibly equal elements, the sum of the signs of permutations which fix the sequence is 1 iff all elements are distinct, and 0 otherwise. This is exactly the condition we wanted for our Mobius function, so grouping permutations back up by their cycle decomposition partition, we get the original desired Mobius function.

Good blog!

For russian speakers there is, for example, this video where prof. Andrei Raigorodskii explains exactly this (but more briefly). Actually, there is probably a recording of almost every lecture of this course (basic combinatorics and number theory) since 2013 (though not all of them are on youtube).