Intro
Hello, Codeforces Community!
The main concept of this blog is the Edge Divide and Conquer method, but there are other techniques that I mentioned and used to solve the example problem below.
Target audience: CF Rating $$$\geq 2000$$$.
Note: Please, feel free to suggest more problems that can be solved using any of the techniques mentioned in the blog in the comments section. You can see an easier version of my modified problem on Luogu, which can also be solved with the standard centroid decomposition method.
Prerequisites:
- Basic tree algorithms: LCA, DFS, etc.
- Divide and Conquer (Centroid Decomposition) on trees. If you do not know the algorithm, you can learn it here: Hybrid Tutorial
Abridged problem statement:
You are given two rooted trees($$$\text{T1}$$$ and $$$\text{T2}$$$). Each of them is rooted at vertex $$$1$$$ and has $$$n$$$ vertices and $$$n - 1$$$ weighted edges. The strange distance between two vertices is defined as in the following equation: $$$val(x, y) = \text{depth1}[x] + \text{depth1}[y] - \text{depth1}[\text{lca1}(x, y)] - \text{depth2}[\text{lca2}(x, y)]$$$. You have to print two integers: the maximum value $$$\text{mx}$$$ of strange distance among all possible pairs $$$1 \leq x<y \leq n$$$ and the number of such pairs that $$$\text{val}(x, y) = mx$$$
Constraints: $$$n \leq 3 \times 10^5$$$, edges can have negative weights.
Solution description:
Step 1: The equation includes four variables, two of which are dependent on $$$\text{lca1(x,y)}$$$ and $$$\text{lca2(x,y)}$$$. The depths of $$$\text{LCAs}$$$ in both trees are not easy to deal with, so by multiplying and dividing every term from the first tree by 2, we can observe the following relationship: $$$\text{depth1}[x] + \text{depth1}[y] - \text{depth1}[\text{lca1}(x, y)] = \frac{1}{2} \cdot (\text{distance1}(x, y) + \text{depth1}[x] + \text{depth1}[y])$$$, thus, from now on, $$$\text{val}(x, y) = \frac{1}{2} \cdot (\text{distance1}(x, y) + \text{depth1}[x] + \text{depth1}[y]) - \text{depth2}[\text{lca2}(x, y)]$$$
Step 2: Failing to solve the problem using the standard centroid decomposition method:
Centroid decomposition is a common approach for problems where paths between all pairs of nodes should be considered. Thus, We could first try to solve this problem with a standard centroid decomposition method, but soon, we will notice that in this way, we have some issues that can be overcome only after using another nice and less-known technique. Using the standard centroid decomposition method, we choose the centroid in the first tree, consider all the paths that go through this node, remove it, and solve the problem for the remaining subtrees recursively. Removing the centroid divides our tree into subtrees with at most half of the current size; thus, until the end of the process, each node will be considered at most $$$\log_{2}n$$$ times. However, when we consider the centroid, the tree may be divided into many subtrees(for simplicity, let's color different subtrees with different colors), and splitting a tree into many colors is an issue in this problem.
$$$\text{val}(x, y) = \frac{1}{2} \cdot (\text{distance1}(x, y) + \text{depth1}[x] + \text{depth1}[y]) - \text{depth2}[\text{lca2}(x, y)]$$$
Specifically, for each centroid vertex $$$\text{C}$$$, we need to iterate over all $$$lca2(x,y)$$$ in the second tree in order to fix this equation's fourth addend. We have two restrictions:
1. At this step, we only consider the vertices $$$x$$$ and $$$y$$$ whose connecting path goes through the vertex $$$C$$$. Thus, $$$x$$$ and $$$y$$$ must have different colors.
2. When we iterate over all $$$\text{lca2(x,y)}$$$ in the second tree, we need to distinguish the colors of vertices. If the problem asked us to find just the maximum $$$\text{val(x, y)}$$$, then for each vertex in the second tree, we could keep only the two best vertices with different colors that have maximum $$$\text{Distance}(C, x) + \text{depth1}[x] + \text{depth2}[x]$$$, but our problem is a bit harder, and you can't simply find the #number of pairs with maximum $$$\text{val}(x, y)$$$ if there are many colors because, for each vertex in the second tree, we need to maintain some information about each color. This will lead to $$$O(n \cdot \text{number of colors})$$$ time complexity, which can be $$$O(n^2)$$$. So what we would like to do is to reduce the number of colors to 2!
Step 3: We can overcome the issue of having many colors by a technique called 'Edge centroid decomposition' or 'Edge divide & conquer.' Instead of dividing the current tree into many 'small enough' parts when deleting a centroid vertex, we want to divide it into exactly two subtrees by deleting an edge, where both parts have roughly equal numbers of vertices. But there is a question: Does a 'good' centroid edge always exist? The answer is no.
Let’s consider the 'Star' tree: If we delete any edge from the tree, we will get two subtrees of sizes $$$\text{1}$$$ and $$$\text{(n–1)}$$$. Then, two subtrees with sizes $$$\text{1}$$$ and $$$\text{(n-2)}$$$ and so on. Thus, in total, we will consider vertices $$$\sim O(n^2)$$$ times, which is too much.
Binarizing the tree:
After figuring out this issue, we are trying to somehow convert the first tree, $$$\text{T1}$$$, into an equivalent tree in a way that there always exists an edge that, when removed, divides our tree into two small enough parts.
For a binary tree, we can always guarantee the existence of such an edge(the one that goes from the centroid vertex to its biggest subtree) that, when removing it, the bigger subtree has at most $$$\frac{2}{3} \cdot \text{CurSize}$$$ vertices. So if we achieve transforming $$$\text{T1}$$$ to the binary tree while preserving the distance between any two original vertices, we will consider each vertex at most $$$\log_{1.5} n$$$ times.
Step 4: We can transform any tree $$$\text{T1}$$$ into a binary tree without changing its structure in this way: (you can also read the editorial of 757G)
To convert the given tree T into an equivalent binary tree (degree of each node is $$$\leq3$$$), we will add at most $$$O(n)$$$ dummy nodes.
The dummy nodes are added in such a way that the structure of the tree and the distances between any two initial vertices are preserved.
To do this, consider a node $$$\text{X}$$$ with degree $$$D > 3$$$. Let $$$C_{1}, C_{2}, C_{3}, \ldots, C_{d}$$$ be its adjacent nodes. Add $$$\text{D}$$$ complementary new dummy nodes, and change the edges as in the image below. (Note that the sum of degrees of vertices whose degree $$$D>3$$$ reduces on each addition of the dummy node. Thus, we need at most $$$\sim O(n)$$$ dummy nodes).
Step 5: When we find the Centroid Edge, $$$\text{E}$$$, our current tree is divided into two parts(white and red), and at this stage, we consider paths between such vertices $$$x$$$ and $$$y$$$ that their connecting path goes through this centroid edge(the vertices $$$x$$$ and $$$y$$$ have different colors). The distance between $$$x$$$ and $$$y$$$, $$$distance1(x,y)$$$, can be split into two independent routes. We can denote $$$\text{Whitevalue}(x) = distance1(x, a) + depth1[x]$$$ for each white node and $$$\text{Redvalue}(y) = distance1(y, b) + depth1[b]$$$ for each red node. Then, when we iterate through all possible $$$\text{LCAs}$$$ in the second tree, and for every node, we need to store the value and number of best white nodes(nodes with maximum $$$\text{Whitevalue}(x)$$$) in its subtree and the value and the number of best red nodes in its subtree. These values can be calculated in $$$O(n)$$$ time by very simple dynamic programming in the second tree. However, when we consider a centroid edge E for the current subtree, no matter if, at this stage, our subtree is little or we are in the initial stage, we consume $$$O(n)$$$ operations for DP in the second tree. Consuming $$$O(n)$$$ time for each centroid edge's subtree will lead to $$$O(n^2)$$$ time complexity.
We don’t actually need to iterate through all vertices in the second tree and consider all of them as $$$\text{LCAs}$$$. To achieve $$$O(\text{CurrentSubtreeSize})$$$ time complexity for the centroid edge’s subtree, we can compress the second tree (i.e., Build Virtual / Auxiliary tree), and considering only the vertices of the Auxiliary tree will be enough.
Virtual/Auxiliary tree:
For a dive into the details of this algorithm, you can read this1 this2.
When we consider a centroid edge of a particular subtree, instead of iterating over all $$$\text{LCAs}$$$ in the second tree, we only matter the 'Key' nodes -- nodes that are in the current subtree and their pairwise $$$\text{LCAs}$$$. The virtual tree is a tree that includes only key nodes but still maintains the original tree's structure and ancestor-descendant relationships. Specifically, we only matter vertices in the current subtree and their pairwise lowest common ancestors. We can find all these pairwise lowest common ancestors by maintaining the sorted list of the vertices according to their DFS traversal order and finding LCA in $$$O(1)$$$ for each pair of adjacent vertices. Thus, we can efficiently build a virtual tree for each centroid’s component.
The overall time complexity is $$$O(n \log_{1.5} n)$$$ because we can use the $$$O(1)$$$ LCA method and Merge Sort to maintain the sorted list of vertices according to their DFS order without adding an additional log factor. This complexity is sufficiently efficient!
orz
How does the sample input work? Why does the first tree only have 5 edges while the second tree have 6 edges?
Thanks for pointing that out. I've edited the blog.