Hello everyone,
finding the diameter is one of the most frequent ways to solve problems about trees. In this tutorial we will see how to find a diameter and some of its properties, and we will use them to solve some problems of increasing difficulty.
The first part of the tutorial is quite basic, so feel free to skip it and jump to the problems if you already know the concepts.
Target: rating $$$[1400, 2100]$$$ on CF
Prerequisites: basic graph theory, greedy
The diameter
Given an unweighted tree, let's define $$$\text{dist}(a, b) =$$$ the number of edges in the simple path $$$a \rightarrow b$$$.
A diameter of the tree $$$a \rightarrow b$$$ is the longest path, i.e., the one that maximizes $$$\text{dist}(a, b)$$$ over all pairs of nodes. If there are multiple diameters, let's pick any of them.
The same definition is valid for a weighted tree with nonnegative weights (with $$$\text{dist}(a, b) =$$$ the sum of the weights of the edges in the simple path $$$a \rightarrow b$$$).
Finding a diameter
Given a tree with $$$n$$$ nodes are multiple ways to find a diameter. Here is one of the simplest ways:
Run a DFS from any node $$$p$$$. Let $$$a$$$ be a node whose distance from node $$$p$$$ is maximized. Run another DFS from node $$$a$$$. Let $$$b$$$ be a node whose distance from node $$$a$$$ is maximized. $$$a \rightarrow b$$$ is a diameter.
Tree = edges of a diameter + forest
Before proving the previous algorithm, let's analyze the structure of the tree (we will mention the diameter, but we will not use the fact that $$$a \rightarrow b$$$ is actually a diameter before proving it).
We started a DFS from node $$$p = 16$$$, and we got that node $$$a = 1$$$ is the farthest from $$$p$$$, and node $$$b = 7$$$ is the farthest from $$$a$$$.
Let's represent the diameter on a line. If you remove the edges of the diameter, you get a forest (i.e., several trees). Let's root each tree at the node in the diameter. What's the height (i.e., the maximum distance from the root to any node) of each component?
Let $$$q$$$ be the root of the component of $$$p$$$. Let's consider any component whose root $$$d$$$ is between $$$a$$$ (included) and $$$q$$$ (excluded), and one of its nodes $$$c$$$.
We get
$$$\text{dist}(p, a) \geq \text{dist}(p, c) \implies \text{dist}(p, a) - \text{dist}(p, d) \geq \text{dist}(p, c) - \text{dist}(p, d) \implies \text{dist}(a, d) \geq \text{dist}(c, d)$$$.
In other words, the height of each component with root in the left half of the diameter (i.e., $$$\text{dist}(a, d) < \text{dist}(d, b)$$$) is at most the distance of the root of the component from the left end of the diameter.
You can prove the same statement for the right half of the diameter (i.e., $$$\text{dist}(a, d) \geq \text{dist}(d, b)$$$), using that $$$b$$$ is the farthest node from $$$a$$$.
Farthest node for each node
For each node $$$i$$$, let's find a node $$$j$$$ such that $$$\text{dist}(i, j)$$$ is maximum.
Claim: $$$j = a$$$ or $$$j = b$$$ always works.
Proof:
- If $$$j = j_1$$$ works ($$$j_1$$$ is not in the same component of $$$i$$$; let's assume without loss of generality that $$$j_1$$$ is closer to $$$a$$$ than to $$$b$$$), $$$\text{dist}(i, j_1) = \text{dist}(i, r) + \text{dist}(r, j_1) \leq \text{dist}(i, r) + \text{dist}(r, a) = \text{dist}(i, a)$$$. Then, $$$j = a$$$ also works.
- If $$$j = j_2$$$ works ($$$j_2$$$ is in the same component of $$$i$$$), $$$\text{dist}(i, j_1) \leq \text{dist}(i, r) + \text{dist}(r, j_1) \leq \text{dist}(i, r) + \text{dist}(r, a) = \text{dist}(i, a)$$$. Then, $$$j = a$$$ also works.
Proof that $$$a \rightarrow b$$$ is a diameter
Now we can finish the proof.
Suppose that $$$u \rightarrow v$$$ is a diameter. We have either $$$\text{dist}(u, a) \geq \text{dist}(u, v)$$$ or $$$\text{dist}(u, b) \geq \text{dist}(u, v)$$$ (see "Farthest node for each node").
Let's assume without loss of generality that $$$\text{dist}(u, b) \geq \text{dist}(u, v)$$$. We get $$$\text{dist}(a, b) \geq \text{dist}(u, b) \geq \text{dist}(u, v)$$$, so $$$a \rightarrow b$$$ is a diameter.
Observations
The algorithm also works in a weighted tree with positive edges (we've never used that the weights are $$$1$$$).
However, it doesn't work on general graphs (discussion).
How to use the diameter
Most of the times, spamming "the farthest node from each node is one end of the diameter" and "the height of each component is smaller than the distance to the closest end of the diameter" is enough to reduce the problem to something simpler.
Find a diameter $$$a \rightarrow b$$$ (from now, $$$a \rightarrow b$$$ will always be a diameter, unless otherwise stated). Now, you may need to consider any path of the tree. There are two cases: the path intersects (blue) or doesn't intersect (green) the diameter.
Then, you may wonder how to make the path longer / "more optimal" / etc. according to the statement. For example, you may need to use $$$\text{dist}(7, 5) \geq \text{dist}(5, 19)$$$ to show that $$$8 \rightarrow 7$$$ is "more optimal" than $$$8 \rightarrow 19$$$.
Hint 1First, put the shops on any path of (unweighted) length $$$k$$$. Can you choose a better path?
Hint 2The optimal path lies completely on the (weighted) diameter.
Hint 3Try all possible paths in $$$O(n)$$$.
SolutionThe optimal path lies completely on the (weighted) diameter. Sketch of proof: suppose that the intersection of the chosen path with the diameter is a path $$$u \rightarrow v$$$ ($$$a, u, v, b$$$ in this order). Then, the maximum distance between a node and the nearest shop is one of the following: $$$\text{dist}(a, u)$$$, $$$\text{dist}(v, b)$$$, $$$\text{dist}(c, d)$$$ ($$$d$$$ on the diameter, $$$d$$$ strictly between $$$u$$$ and $$$v$$$, $$$c$$$ in the component of $$$d$$$). Then, $$$u \rightarrow v$$$ itself is optimal.
Now let's try all paths $$$u \rightarrow v$$$ of maximum (unweighted) length (there are $$$O(n)$$$ of them). Notice that the required distance is also the maximum of $$$\text{dist}(a, u)$$$, $$$\text{dist}(v, b)$$$, $$$\text{dist}(c, d)$$$ ($$$d$$$ on the diameter, but not necessarily between $$$u$$$ and $$$v$$$). If you precalculate the distance of each node from the root of its component and from $$$a$$$, $$$b$$$, we can get the maximum distance between a node and the nearest shop in $$$O(1)$$$.
Implementation by nor (C++): 151009669
Hint 1You have to find the maximum total weight of two non-intersecting paths.
Hint 2Once again, choose any two paths and try to "improve" them.
Hint 3There are two cases to consider:
- a path is the diameter and the other path lies completely in a component;
- a path is a prefix of the diameter + a vertical path in the corresponding component, the other path is a suffix of the diameter + a vertical path in the corresponding component.
Can you solve them in $$$O(n)$$$?
SolutionYou have to find the maximum total weight of two non-intersecting paths. Notice that the facts stated previously hold even if weights are on nodes instead of edges.
There are two cases to consider:
- a path is the diameter and the other path lies completely in a component;
- a path is a prefix of the diameter + a vertical path in the corresponding component, the other path is a suffix of the diameter + a vertical path in the corresponding component.
Sketch of proof: suppose that the intersection of a chosen path $$$i \rightarrow j$$$ with the diameter is a path $$$u \rightarrow v$$$ ($$$i$$$ in the component of $$$u$$$; $$$a, u, v, b$$$ in this order). The paths $$$a \rightarrow j$$$, $$$i \rightarrow b$$$, $$$a \rightarrow b$$$ are longer than $$$i \rightarrow j$$$, so it's optimal to take one them instead of $$$i \rightarrow j$$$ if possible.
It's possible to solve both cases in $$$O(n)$$$:
- find the diameter of each component, pick the maximum;
- precalculate the height of each component, the distance of each node from $$$a$$$, $$$b$$$, the prefix maximums $$$\text{pm}(i)$$$ of $$$\text{dist}(a, i) + \text{height}(i)$$$, the suffix maximums $$$\text{sm}(i)$$$ of $$$\text{dist}(i, b) + \text{height}(i)$$$, and find the maximum $$$\text{pm}(i) + \text{sm}(i + 1)$$$.
Hint 1How to check fast if a path contains an even number of ramen cups?
Hint 2Let $$$\text{cups}(i, j)$$$ be the parity of the number of cups in the path $$$i \rightarrow j$$$. $$$\text{cups}(i, j)$$$ is the XOR ($$$\oplus$$$) of the number of cups of the edges. $$$\text{cups}(i, j) = \text{cups}(1, i) \oplus \text{cups}(1, j)$$$. You have to find the maximum $$$\text{dist}(i, j)$$$ ($$$\text{cups}(1, i) = \text{cups}(1, j)$$$). Can you reduce the number of candidate longest paths $$$i \rightarrow j$$$?
Hint 3The optimal path is either $$$a \rightarrow i$$$ or $$$i \rightarrow b$$$. You have to find the maximum $$$\text{dist}(a, i)$$$ among all $$$i$$$ such that $$$\text{cups}(1, i) = \text{cups}(1, a)$$$. What happens when you update an edge?
Hint 4When you update an edge, you toggle $$$\text{cups}(1, i)$$$ in a subtree. If you order nodes in DFS order, each subtree is represented by a subarray. How to answer queries?
Hint 5Store $$$-1^{\text{cups}(1, i)} \cdot \text{dist}(a, i)$$$ in a segment tree (same with $$$b$$$, in another segment tree). You have to support range multiply by $$$-1$$$, range max and range min.
SolutionLet $$$\text{cups}(i, j)$$$ be the parity of the number of cups in the path $$$i \rightarrow j$$$. $$$\text{cups}(i, j)$$$ is the XOR ($$$\oplus$$$) of the number of cups of the edges. Let $$$l$$$ be $$$\text{lca}(i, j)$$$. Then, $$$\text{cups}(i, j) = \text{cups}(i, l) \oplus \text{cups}(l, j) = \text{cups}(i, l) \oplus \text{cups}(l, 1) \oplus \text{cups}(1, l) \oplus \text{cups}(l, j) = \text{cups}(1, i) \oplus \text{cups}(1, j)$$$.
You have to find the maximum $$$\text{dist}(i, j)$$$ ($$$\text{cups}(1, i) = \text{cups}(1, j)$$$).
Claim: the optimal path is one of the following: $$$a \rightarrow i$$$, $$$i \rightarrow b$$$.
Proof: suppose that $$$i \rightarrow j$$$ is optimal ($$$\text{cups}(1, i) = \text{cups}(1, j)$$$; $$$a$$$, root of $$$i$$$, root of $$$j$$$, $$$b$$$ in this order). $$$a \rightarrow j$$$, $$$i \rightarrow b$$$, $$$a \rightarrow b$$$ are at least as long as $$$i \rightarrow j$$$. If you can't take neither $$$a \rightarrow j$$$ nor $$$i \rightarrow b$$$, you get $$$\text{cups}(1, a) = \text{cups}(1, j) \oplus 1 = \text{cups}(1, i) \oplus 1 = \text{cups}(1, b)$$$, so you can take $$$a \rightarrow b$$$.
After each query, you have to find the maximum $$$\text{dist}(a, i)$$$ among all $$$i$$$ such that $$$\text{cups}(1, i) = \text{cups}(1, a)$$$ (you can handle $$$\text{dist}(i, b)$$$ similarly). For each query, you have to flip all values of $$$\text{cups}(1, i)$$$ in a subtree.
In order to support subtree updates, let's use a segment tree where each leaf represents a node of the tree, and these nodes are in DFS order. Each leaf stores $$$\text{dist}(a, i)$$$ if $$$\text{cups}(1, i) = 0$$$, $$$-\text{dist}(a, i)$$$ otherwise. Flipping a subtree corresponds to multiplying by $$$-1$$$ on a range of the segment tree. Finding the maximum $$$\text{dist}(a, i)$$$ with $$$\text{cups}(1, a) = \text{cups}(1, i)$$$ corresponds to finding either the minimum or the maximum value of the segment tree.
Other problems
CSES — Tree Distances I (to check your implementation) (suggested by nor) 102694B - Dynamic Diameter
abc221_f
CEOI 2019 — Dynamic Diameter (hard) (nor)
Conclusions
We've seen that finding a diameter can also solve seemingly unrelated problems, and it's a good candidate idea if the problem involves a tree and maximum lengths/distances.
Of course, suggestions/corrections are welcome. In particular, please share in the comments other problems where you have to use the diameter.
I hope you enjoyed the blog!