[Tutorial] Catalan Numbers and Catalan Convolution

Revision en4, by Dardy, 2021-02-08 14:00:21

Catalan Numbers is a well known sequence of integers that appear in combinatorics, there is a chance that you might have run into such counting problems and you might have even solved them with DP without realizing that they are a known sequence. Here are a few problems to get us started.

Binary Trees Count

Find the count of different binary trees with $$$n$$$ nodes, we'll denote to that by $$$C_n$$$. This can be solved by observing the substructure of the problem:

binary tree

From the root, the left subtree $$$a$$$ can be any binary tree of size $$$x$$$, the right subtree $$$b$$$ can be any binary tree of size $$$y$$$, such that $$$x+y=n-1$$$ (as we took one node as the root). With the base case at $$$C_0=1$$$ as the empty tree is a valid tree of size $$$0$$$. This directly gives us the answer by the following recurrence relation: (Segner's recurrence relation)

$$$ C_{n+1} = \displaystyle\sum_{i=0}^{n}{C_i C_{n-i}}, n \geq 0; C_0=1 $$$

The sequence of numbers $$$C_0, C_1, C_2, ...$$$ is called Catalan Numbers, and for reference they are: $$$1, 1, 2, 5, 14, 42, ...$$$.

This gives us a simple $$$O(n^2)$$$ implementation with DP, however can we do better? In fact we can, this simple recurrence formula gives out the same values:

$$$C_n = \displaystyle\frac{1}{n+1} \displaystyle{2n \choose n} = \displaystyle\frac{2n!}{n!(n+1)!}$$$

There are a lot of ways to prove this formula, perhaps the most common involves generating functions but I will attempt to prove it using another method. First off though, we will need to learn some different applications of Catalan numbers.

Remember, any time you manage to prove that a counting problem has a solution expressed by combining two identical subproblems with total size = $$$n-1$$$, it means this problem has the same solution as all other catalan problems.

Balanced Parentheses count

Find the count of balanced parentheses sequences consisting of $$$n$$$ pairs of parentheses (for example $$$(()(()))$$$ is a balanced sequence of $$$4$$$ pairs). A sequence of parentheses is balanced if each opening symbol has a corresponding closing symbol and they are properly nested, or in layman's terms: a closing bracket never appears before an opening one.

Notice that if you consider opening brackets as a "+1" and closing brackets as a "-1", this problem is bijective to a Dyck word of length $$$2n$$$. A Dyck word is a word of "+1"s and "-1" such that any prefix sum is never negative.

Now, to solve our problem we can think of it using this substructure: "$$$($$$ $$$A$$$ $$$)$$$ $$$B$$$", where $$$A$$$ and $$$B$$$ are themselves valid (possibly empty) parentheses sequences. Let's denote number of pairs of parentheses in $$$S$$$ by $$$|S|$$$. Since we used one pair it follows that: $$$|A| + |B| = n-1$$$. This will make us arrive at the same recurrence relation as before, which means the answer for size $$$n$$$ is $$$C_n$$$.

Let's however try to calculate an answer using another way: We will calculate all possible ways to arrange $$$n$$$ pairs of parentheses and then we will subtract the count of invalid sequences: Our answer is $$$A_n - B_n$$$ where $$$A_n$$$ is total number of ways, $$$B_n$$$ is the count of invalid ways.

For starters there are $$$2n$$$ total brackets, of which $$$n$$$ are open, $$$n$$$ are close. This means $$$A_n = {2n \choose n}$$$.

To determine $$$B_n$$$ let's define the following (notice I will use 1-based indexing here): For an invalid sequence $$$S$$$, let $$$k$$$ be the index of the earliest bracket that causes $$$S$$$ to be invalid, i.e. in $$$S[1..k]$$$: $$$x + 1 = y$$$ where $$$x$$$ is count of '$$$($$$', and $$$y$$$ is count of '$$$)$$$', in layman's terms: in the prefix up to index $$$k$$$, the count of closing brackets is exactly one more than the count of open brackets, and $$$k$$$ will be the earliest index at which this property is fulfilled.

invalid parenthesis squence

(In this example, $$$n=9, k=7, X=3, Y=4$$$, indeed $$$x + 1 = y$$$)

Let's define the following mapping from $$$S \rightarrow \hat{S}$$$: $$$\hat{S} = S[1..k] ^\frown \overline{S}[k+1..2n]$$$, $$$^\frown$$$ is a symbol for concatenation, $$$\overline{S}$$$ means the sequence $$$S$$$ but all open and closed brackets are swapped (inversed). Layman's speaking: Take sequence $$$S$$$, flip all brackets starting from $(k+1)$th bracket.

The new sequence $$$\hat{S}$$$ has an interesting property, $$$X = n - 1, Y = n + 1$$$. This is easily observed intuitively, but here is a proof: by seeing that the prefix had $$$x$$$ open brackets, and $$$y$$$ closing brackets, and the original sequence had $$$n$$$ open and $$$n$$$ closing brackets. The postfix has $$$n-x$$$ and $$$n-y$$$ open and closing brackets respectively. After the transformation and adding the brackets in the prefix to the count: $$$X = n-y+x, Y=n-x+y$$$ knowing $$$y=x+1$$$: $$$X = n-1, Y = n+1$$$.

S hat

(In this example, $$$X = 8, Y = 10$$$)

Now, the count of such $$$\hat{S}$$$ sequences is simple: It is a sequence of length $$$2n$$$ of which $$$n-1$$$ is open, $$$n+1$$$ is close. this means $$$B_n = {2n \choose n-1}$$$. But halt right there, who's to prove that the count of $$$\hat{S}$$$ is the same as the count of bad $$$S$$$ ? To do that we need to prove that $$$S \rightarrow \hat{S}$$$ is bijective, we could attempt that by proving that it is injective and surjective. Oh boy...

  • Bijective means a one-to-one correspondence between two sets. It means that every item in the first set maps to exactly one item in the second set with no items in the second set not being mapped to. This implies that any property that applies to one of the sets, applies to the other.
  • Injective means every item in the first set maps to exactly one item in the second set (no two items map to the same one), but the restriction on the second set not having unmapped items is not required.
  • Surjective means that every item in the second set is the result of mapping of some item in the first set. i.e. every item in the second set is reachable but not necessarily by distinct ways.

So it's clear, Bijective is basically Injective and Surjective combined. Let's attempt those proofs intuitively.

The mapping is clearly injective because a change in $$$S$$$ carries directly to $$$\hat{S}$$$, as $$$k$$$ must be the same, and by extension, the prefix is the same between the two sequences, and the postfix is just inversed. Similarly the mapping is surjective because if you start with any $$$\hat{S}$$$, you can find index $$$k$$$ exactly as we did before, reverse the postfix $$$\hat{S}[k+1..2n]$$$ to arrive at a valid $$$S$$$. So there are no unreachable $$$\hat{S}$$$. Thus the mapping is bijective. We conclude that truly: $$$B_n = {2n \choose n-1}$$$. Since $$$A_n - B_n$$$ was just another way for us to calculate the solution of this problem, which we already know to be $$$n$$$th catalan $$$C_n$$$ it is justifiable to write:

$$$C_n = A_n - B_n = \displaystyle{2n \choose n} - \displaystyle{2n \choose n-1} = \displaystyle{2n \choose n} - \displaystyle\frac{n}{n+1}\displaystyle{2n \choose n} = (1-\displaystyle\frac{n}{n+1})\displaystyle{2n \choose n} = \displaystyle\frac{1}{n+1}\displaystyle{2n \choose n}$$$

Which is the thing we've been trying to prove : D

Now to efficiently calculate $$$C_n$$$ you can manipulate the equation to find $$$C_n$$$ in terms of $$$C_{n-1}$$$ so you can use a building table to calculate it. To save you the hassle, it is: (although feel free to derive it using combination expansion, hint: expand the expression $$$\frac{C_n}{C_{n-1}}$$$)

$$$C_n = \displaystyle\frac{4n - 2}{n+1} C_{n-1}, C_0 = 1$$$

CSES — Bracket Sequences I

Balanced Parentheses count with prefix

Tags #math, #catalan, #combinatorics


  Rev. Lang. By When Δ Comment
en8 English Dardy 2021-02-10 15:08:10 2
en7 English Dardy 2021-02-08 22:15:24 35 split formula into multiple lines
en6 English Dardy 2021-02-08 22:11:31 1293 UPD 1: added a not very detailed proof of the formula of catalan convolution
en5 English Dardy 2021-02-08 21:35:48 3195 (published)
en4 English Dardy 2021-02-08 14:00:21 254
en3 English Dardy 2021-02-08 05:19:00 4856
en2 English Dardy 2021-02-07 16:42:03 2035 Tiny change: 'es Count\n==================\nFind the' -> 'es Count\n------------------\nFind the'
en1 English Dardy 2021-02-07 15:58:28 1103 Initial revision (saved to drafts)