### galen_colin's blog

By galen_colin, 15 months ago,

See the original post on Codeforces for a playlist of all similar tutorials and also more information about what this is.

# "Intro"

Timestamp: 00:00

This is a "hybrid tutorial" — it has both a blog and a video component. See my original post for more explanation of what this is and also a playlist of all past tutorials. You can view the full video here.

This tutorial is on centroid decomposition, a cool concept that has many uses in different problems. It's also another "decomposition" technique, one of many. I feel like this subject in particular doesn't have many great tutorials online that really capture the essence of what we do with the technique, so hopefully, I can improve that with this one.

# Notation

We'll use $0$-indexing for the whole tutorial, and the example problem will also be $0$-indexed. In addition, "vertex" and "node" mean the same thing and may be used interchangeably.

# Example problem

Timestamp: 02:04

If you've read other centroid decomposition tutorials, you've probably seen a problem called Xenia and Tree. While it's rather well-known now, it's a great example to use for an introductory tutorial because most of the implementation is simple.

I'll restate the problem here as well:

You have a tree of $n$ vertices, where each vertex is initially blue except for vertex $0$, which is red. There are two types of queries ($q$ in total):

• Set the color of a vertex $v$ to red

• Given a vertex $v$, compute the distance to the closest red vertex to $v$ (which could be itself)

There are other ways to solve this such as sqrt decomposition, but we'll do it with centroid decomposition.

# Prerequisite

Timestamp: 05:16

• Trees + traversal (dfs)

Not many this time. It's a rather lightweight topic overall, yet possibly difficult to understand.

# Reformulating the problem

Timestamp: 06:06

First, let's consider a natural yet inefficient solution, which we will later optimize with centroid decomposition. Root the tree arbitrarily, let's say, at vertex $0$. Then let's maintain, for each vertex $v$, the closest red node to it that's within its subtree, and call it $best[v]$.

Now there's a caveat here — this solution requires efficiently finding the distance between two vertices. I'll use the notation $dist(u, v)$ to denote the distance between vertex $u$ and $v$. We can compute this in $O(1)$ or $O(\log{n})$ with LCA, but this is not necessary to know for this tutorial. Just assume this function exists and can be computed efficiently. We'll assume for the rest of the tutorial that it takes $O(\log{n})$ because the optimization to $O(1)$ is unnecessary and harder to implement in my view.

Timestamp: 08:14

Back to the problem now. When we paint a node $v$ red, to maintain the information we want, we have to update the distances to all of $v$'s ancestors. This, in implementation, is simply setting $best[u] := min(best[u], dist(u, v))$ for all ancestors $u$ of $v$ (including $v$ itself). How long does this take per query? At worst, $v$ has $O(n)$ ancestors, so it can take $O(n\log{n})$ taking the complexity of $dist$ into account.

Reformulating queries

Timestamp: 15:29

But let's ignore the complexity, we can optimize it later with the actual centroid decomposition. Let's handle the queries now. To compute the closest red node to $v$, there are two cases:

• The closest node is in $v$'s subtree. In this case, we use the value $best[v]$ that we already have stored.

• The closest node is not in $v$'s subtree. Then it's in the subtree of some ancestor $u$ of $v$ but not in $v$'s subtree (and of course the distance is $best[u]$ since we want the closest red node in $u$'s subtree). Therefore the path from the closest node to $v$ goes through $u$, so we can represent its length as $best[u] + dist(u, v)$. What's most important here is that the path goes through $u$.

Of course, when computing a query, we don't know which case is optimal, so we have to try both of them and all possible $u$. Just like updates, this can also take $O(n\log{n})$ in the worse case. You may note that we don't have to recompute $dist(u, v)$ each time as it just increases by $1$, but this optimization won't work later so we'll ignore it.

Timestamp: 20:41

Now, you may say, what the hell even is this tutorial? I've given you an $O(n^2\log{n})$ solution when there's a stupid $O(n^2)$ brute-force that will work even faster! And you would be correct to think this. But we can improve it by noticing that a bottleneck is the height of the tree, that is, some vertices can have up to $O(n)$ ancestors. As it turns out, we can actually construct a different, related tree of height $O(\log{n})$ and solve the problem in the same way with the new tree. But first, we need to talk about parallel universes centroids.

# Centroids

Timestamp: 20:57

A centroid is defined as a vertex with the following property: when removed, all of the resulting subtrees have a size of at most half that of the original tree (that is, $\lfloor \frac{n}{2} \rfloor$). There is always at least one in any tree (there may even be multiple — consider a bamboo of length 4), which we'll prove right now.

Proof

# The "centroid tree"

Timestamp: 30:09

Now, why are centroids useful? Let's define a recursive process where we create a new tree — a "centroid tree" — from our input tree. Say we root the tree at a centroid of the tree. Then all the remaining subtrees have size at most half of the original tree. We'll run the recursive process to create new trees for each of those subtrees, then attach them as children to our root. See the video for a full illustration of how this works.

This new tree has two important properties:

• Timestamp: 35:42

The height is at most $O(\log{n})$. This is because, at each step, the size of every subtree created is at most half of the original tree. So we ask "how many times can we divide $n$ by $2$ before getting $\leq 1$", which is the definition of $\log_2{n}$. This also means the process of constructing the tree is $O(n\log{n})$. Common sense says this is true, because there are at worst $O(\log{n})$ depths and each depth takes at worst $O(n)$ to compute. But if you want formality, you can also use the master theorem with the worst-case recurrence $T(n) = 2 * T(\frac{n}{2}) + O(n)$.

• Timestamp: 37:11

Each subtree in the centroid tree forms a connected component in the original tree. This is because of our process of construction: for every vertex, when it's added to the centroid tree, it's part of some connected subtree that hasn't been removed yet. Once we remove that vertex, every vertex in that subtree will be added as a child of that vertex, and thus all of its children in the centroid tree will be part of that connected subtree.

Both of these will be necessary to solve the original problem.

# Returning to the problem

Timestamp: 39:52

Why is it important that subtrees of the centroid tree are contiguous in the original tree? It means that in the original tree, for a vertex $v$, every other vertex not part of $v$'s subtree (in the centroid tree) is either an ancestor of $v$ in the centroid tree or "blocked off" from $v$ by at least one such ancestor.

Think about that claim for a second. It also comes from the way we construct the centroid tree — the subtree of $v$ is always "surrounded" by its centroid-tree ancestors, as the removal of those ancestors "cuts" the subtree of $v$ into what it is. It's impossible for a vertex adjacent to a vertex from $v$'s subtree to not be an ancestor of $v$ in the centroid tree, because if that was true, that vertex would just be in $v$'s subtree because we construct the centroid tree that way.

Now let's modify $best[v]$ to be the distance to the closest red vertex in $v$'s subtree in the centroid tree, rather than in the original tree. With that, we end up with the exact same two cases as we had before:

• The shortest red vertex to $v$ is in $v$'s subtree in the centroid tree. In that case, we just take $best[v]$.

• The shortest red vertex to $v$ is "blocked off" by some centroid-tree ancestor $u$ of $v$, and therefore it passes through $u$. Then the distance is $best[u] + dist(u, v)$, and we take the minimum of this over all $u$.

And updates are also the same, we just set $best[u] := min(best[u], dist(u, v))$ for all centroid-tree ancestors $u$ of $v$ (including $v$).

But now, there's a crucial difference: the height of the centroid tree is always $O(\log{n})$. That means each update and each query only affects $O(\log{n})$ vertices, and also only takes $O(\log^2{n})$ time to compute if $dist$ is $O(\log{n})$. That means we've done it — that's our solution, with a complexity of $O(n\log{n} + q\log^2{n})$, the first part for constructing the tree and the second for answering queries. Again, you could improve this to $O((n + q)\log{n})$ if you want with faster LCA.

# Implementation

Timestamp: 53:22

Despite the difficulty of the concept, the implementation is rather simple. I'll break it into sections.

Creating the tree

We first find the centroid $c$ of the input tree. Then we mark it as the root, recursively solve for the subtrees created by removing $c$ (in the implementation, marking vis[c] as true is equivalent to removing $c$), and set the parents of the resultant subtrees to $c$. This goes on recursively until we create the whole tree, like a modified DFS that traverses centroids.

Code

Finding centroids

We've basically gone over this in the proof that a centroid always exists, but I'll re-hash it. There are two parts: finding subtree sizes and using them to find a centroid. In both functions, we avoid visited nodes to stay within our subtree, as those are centroids that have already been solved for.

Code (finding subtree sizes)

To find a centroid (this is the same as the algorithm mentioned in the proof), let's root the tree at an arbitrary vertex $r$. If $r$ is a centroid, we're done, otherwise, one subtree has a size of more than $\lfloor \frac{n}{2} \rfloor$. Let's then go into that subtree's root, and call it $s$. Now let's treat $s$ the same way: either it's a centroid, or we move into the subtree that has a size greater than $\lfloor \frac{n}{2} \rfloor$. As we proved above, this will always terminate.

Code (finding a centroid)

# Problem-specific implementation

There are a few parts specific to this problem: the $dist$ function, applying updates, and applying queries. Since centroid decomposition is just a technique rather than an algorithm, this will be different for each problem. I'll briefly touch all of them.

Computing $dist$

Timestamp: 56:42

We'll use LCA to do this. If you've seen my HLD tutorial, you'll know that any path $u$ -> $v$ on a tree can be broken into two parts: the path from $u$ to $lca(u, v)$ and the path from $lca(u, v)$ to $v$. So now we want $dist(u, lca(u, v)) + dist(lca(u, v), v)$. This is simpler because the LCA is always an ancestor of $u$ and $v$, so the total distance can be computed using depths as $(depth[u] - depth[lca(u, v)]) + (depth[v] - depth[lca(u, v)])$. This is simplifiable to $depth[u] + depth[v] - 2 \cdot depth[lca(u, v)]$.

If you don't know LCA but still want to try the problem, you can copy my LCA template in the full solution below.

(Partial) code

This is exactly how it was described: just set $best[u]$ to $min(best[u], dist(u, v))$ for all centroid-tree ancestors of $v$.

Code

This is also exactly how it was described, find the minimum of $best[u] + dist(u, v)$ over all centroid-tree ancestors of $v$. Assume $best[u] = \infty$ initially.

Code

# Full implementation

You can find my full implementation here, and my solution for Xenia and Tree here. My solution also has an LCA template that you can copy to try the problem if you don't know the concept.

Steps for using it: * Read in n, and call init(n). * Read in all edges, and call edge(u, v) for an edge between u and v. * Call init_centroid (the root v can be arbitrary, make sure p is -1). * That's it — now the tree is represented by the parents for each vertex.

# More practice

Centroid decomposition can be used in many ways. This application was a "bottom-up" approach, starting from a query/update vertex and going up the centroid tree, updating or querying the closest vertex in each subtree. In this case, the only special part about the centroid tree is that its height is $O(\log{n})$.

Other problems can require centroid decomposition in a "top-down" manner — that is, divide-and-conquer. For example, you could solve for all paths going through the centroid, then recursively solve for the resultant subtrees after removing the centroid. This method is upper-bounded by $O(\log{n}) \cdot f(n)$, where $f(n)$ denotes the time to solve for all paths through a specific vertex since there are $O(\log{n})$ "levels".

I'm not big-brain enough to create a whole mashup contest for this subject, but I can do the next best thing: yoink other people's problems. So here's a list of what I (and others) have found so far:

Feel free to recommend more practice problems or just leave feedback in general. I will likely add more in the future.

(small edit: fixed a missing dollar sign and a formatting issue caused by using [cut])

• +264

 » 15 months ago, # |   +10 Colin thank you for your useful tutorial ! You can add this problems https://codeforces.com/blog/entry/52492 .
•  » » 15 months ago, # ^ |   +1 Thank you! I've added that blog (and a few more problems).
 » 15 months ago, # |   0 Thank you so much for helping this community. You guys are making it so easy to learn CP these days.
 » 15 months ago, # |   +20 Wow. You drew everything with a mouse!
•  » » 15 months ago, # ^ |   +3 I plan to get a writing tablet eventually (to spare my hand), but for now the mouse-writing is probably good enough
 » 15 months ago, # |   +10 I learnt centroid decomposition while solving Yet Another Tree Problem.
•  » » 15 months ago, # ^ |   +8 Added, thank you! That works as a nice example of both the divide-and-conquer and the "bottom-up" uses of centroid decomposition.
 » 15 months ago, # |   +10 best[u]:=max(best[u],dist(u,v))Shouldn't it be min?
•  » » 15 months ago, # ^ |   +3 Whoops, sorry. Fixed, thanks!
•  » » » 15 months ago, # ^ |   0 No Problem...Can you include another problem discussion in this post.....which has more to do with paths....the simplest one would be all path sums....It feels to me a bit different than this.....although it might only be me.....!!
 » 15 months ago, # |   +3 @galen_colin orz
 » 15 months ago, # |   +10
•  » » 15 months ago, # ^ |   0 Added, thanks!also geeksforgeeks totally stole this problem
 » 15 months ago, # | ← Rev. 5 →   +18 You can also use the centroid decomposition to compute distances between any two nodes in $O(\log n)$, so you don't need LCA to solve Xenia and Tree. Just compute the distances from every node to its ancestors in the centroid tree. Each node has at most $O(\log n)$ ancestors, so you will need additional $O(n\log n)$ memory, but there is basically no runtime overhead: while constructing the centroid decomposition, you can compute distances to the previously found centroid (from its descendants) in the DFS you use to find subtree sizes. Now, to compute distances between any two nodes you just need to find their LCA in the centroid tree. You can see my submission 90467624 for details.
 » 14 months ago, # |   +1 I appreciate your blog! I finally learned centroids! But you forgot to check !vis[x] in find_size function.
•  » » 14 months ago, # ^ |   +1 he does the check at the start of the function
•  » » » 14 months ago, # ^ |   +1 Oh, I didn't see that. Thanks!
 » 8 months ago, # |   +3 In Xenia and tree instead of using LCA I used vector> for each centroid to store distance of all it's nodes in the corresponding subtree (of centroid tree). What I think is that it would take O(nlogn) memory. But after implementing it, it is giving MLE. I really don't know what I am doing wrong. Can anyone please help me. My code
•  » » 5 months ago, # ^ |   +3 Using both vector and unordered_map is extremely memory-exhausting. Try replacing it with arrays(e.g, my submission) and map(log(log(n))).