cuom1999's blog

By cuom1999, history, 4 years ago, In English

Statement:

Given a tree of around $$$10^5$$$ nodes, $$$10^5$$$ queries, and an integer $$$k$$$. Each query has $$$k$$$ vertices $$$a_1, a_2, ..., a_k$$$. For each query, find a node $$$x$$$ such that $$$dist(x, a_1) + dist(x, a_2) + ... + dist(x, a_k)$$$ is maximum, then print that maximum value. Here $$$dist(u, v)$$$ is the number of edges on the simple path from $$$x$$$ to $$$y$$$.

I have been thinking about this problem for a while but not been able to solve it, even for small $$$k$$$ like $$$2$$$ or $$$3$$$. Any idea on how to solve this problem? I appreciate all ideas and any sources of similar problems.

  • Vote: I like it
  • +48
  • Vote: I do not like it

| Write comment?
»
4 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by cuom1999 (previous revision, new revision, compare).

»
4 years ago, # |
Rev. 3   Vote: I like it +15 Vote: I do not like it

I'm assuming there is bound on $$$Qk$$$ too, otherwise it doesn't make sense.

Let us root at $$$1$$$.

You want a virtual tree(Tree composed of the nodes and their lcas). We can show that the tree consists of $$$O(k)$$$ vertices at most. Notice that the answer must be leaf.

So we can use dp to find furthest leaf from each vertex to keep track of that. Now the answer depends on which node's subtree it is in.

Let's say there is some leaf at $$$d$$$ distance from some node $$$u$$$ in $$$a$$$. The answer is $$$dk + \sum dist(u,a_i)$$$. Now we can compute the left part in $$$O(n)$$$ for all nodes using precomputation.

Now for the right part, we can use rerooting on a dp based on number of nodes in the subtree that are in our query. Using that, we can notice when we traverse some edges, the sum increases by the number of nodes not in the subtree, and decreases by the number of nodes in the subtree. So using this we can maintain the distances from all nodes.

There is also one edge case, where $$$1$$$ itself is leaf and answer. Just take care of that too.

Finally, we get $$$O(n + Qk\log k)$$$

Edit : Incorrect solution. I still believe it can be solved by virtual trees however.

https://www.youtube.com/watch?v=czySm7bUHgY&t=1849s Here is tutorial where I learnt from. He doesn't post anymore :(

  • »
    »
    4 years ago, # ^ |
      Vote: I like it +8 Vote: I do not like it

    I think the left part is really hard. Choosing a leaf for every node $$$u$$$ should not be the same in different queries. For example, if $$$k=2$$$ and we call $$$b$$$ as $$$lca(a_1, a_2)$$$, the leaf chosen for $$$b$$$ should belong to a subtree of $$$b$$$ that not contains $$$a_1$$$ and $$$a_2$$$. Otherwise, the formula is not correct.

    • »
      »
      »
      4 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Yes you are correct. Terribly sorry for missing something so simple.

      However, You can maintain a set on each node with the maximum depth of all of its subtrees, and erase an element every time you add an edge to it. To find which subtree it lies in, we can use binary lifting to find the node just below the top vertex in the path. since the adding edges part is $$$O(k)$$$, my time complexity is now $$$O(n + Qk\log n)$$$. I would think there exists a better solution however.

      • »
        »
        »
        »
        4 years ago, # ^ |
          Vote: I like it +8 Vote: I do not like it

        Consider the case $$$k=2$$$ like my below comment. I think it would not be that simple, and I think the adding part is not $$$O(k)$$$.

        • »
          »
          »
          »
          »
          4 years ago, # ^ |
          Rev. 2   Vote: I like it +5 Vote: I do not like it

          You are correct. I missed a few cases leading me to the incorrect solution. Sorry :( I guess I will have to be more strict with myself before posting a "solution".

          By any chance, do you have someplace I can submit? I want to try fixing my solution.

          • »
            »
            »
            »
            »
            »
            4 years ago, # ^ |
              Vote: I like it 0 Vote: I do not like it

            I came up with the problem myself (not really but I was inspired by the minimum problem). Maybe you can try writing a brute force and stress test it.

  • »
    »
    4 years ago, # ^ |
    Rev. 2   Vote: I like it +13 Vote: I do not like it

    "Let's say there is some leaf at $$$d$$$ distance from some node $$$u$$$ in $$$a$$$. The answer is $$$dk + \sum dist(u, a_i)$$$."

    This is only true if $$$u$$$ is between the leaf in question and every other node in $$$a$$$, but there may not be any such $$$u$$$ in $$$a$$$, for example if the leaf in question is in a subtree branching off of a path between two nodes in $$$a$$$, in a situation that looks something like this:

    a_1 ----- + ----- a_2
              |
              |
              |
              |
              |
              + ----- leaf
    
    • »
      »
      »
      4 years ago, # ^ |
        Vote: I like it +5 Vote: I do not like it

      Yes, I'm terribly sorry for that mistake. I for some reason assumed the leaf will branch from a leaf node of the virtual tree, which is not the case.

  • »
    »
    4 years ago, # ^ |
      Vote: I like it +42 Vote: I do not like it

    oh wow that's me

    • »
      »
      »
      4 years ago, # ^ |
        Vote: I like it +10 Vote: I do not like it

      I looked at that video just now, that's something amazing I learned. Why did you stop making more videos? Did you just plan to make that one thing alone? Just curious

      • »
        »
        »
        »
        4 years ago, # ^ |
          Vote: I like it +24 Vote: I do not like it

        Well tbf I wanted to make a youtube video as a meme and then while thinking about what to make it on, I decided to choose some topic that is well known, but there are almost no tutorials about it (and virtual tree was one example). After that I guess I was simply too lazy to record other things, as that one actually took quite a lot of time.

        Recently I was actually planning to make 2 videos. One about LiChao trees and another one about Fully Dynamic CHT (with removals in $$$\mathcal{O}(\log^2 n)$$$), but unfortunately I never had the time to record them (I even wrote some notes with pictures and etc. for them).

        So yeah, seeing that there are some people who seem to have enjoyed the first one, I might upload something new soon.

»
4 years ago, # |
  Vote: I like it 0 Vote: I do not like it

For $$$k=2$$$, we have $$$dist(x, a_1) + dist(x, a_2) = dist(a_1, a_2) + 2 \cdot dist(x, P)$$$, where $$$P$$$ is the path between $$$a_1$$$ and $$$a_2$$$. If we root at the center of the graph, and the path $$$P$$$ does not go through the center, we can easily find the maximal $$$x$$$ in $$$O(\log n)$$$ by choosing some diameter endpoint which is not in the same subtree of the root as $$$P$$$. Otherwise, we can find the deepest nodes in subtrees branching off from this path $$$P$$$ in $$$O((\log n)^2)$$$ using Heavy-Light decomposition.

For larger values of $$$k$$$, something similar should be possible, but I think a convexity trick will be required as well.

  • »
    »
    4 years ago, # ^ |
    Rev. 2   Vote: I like it 0 Vote: I do not like it

    How do you find the deepest nodes in subtrees branching off from path $$$P$$$? For example, if the path is like this

    drawing

    You have to consider the deepest node in the subtree of $$$3$$$ not containing $$$5$$$. It seems not trivial with HLD.

    • »
      »
      »
      4 years ago, # ^ |
      Rev. 2   Vote: I like it +8 Vote: I do not like it

      HLD breaks up the path from each $$$a_i$$$ to the root into $$$O(\log n)$$$ light edges and $$$O(\log n)$$$ sub-paths of heavy edges. Precompute for each node its deepest descendant which is not in its heavy subtree, and the distance to that descendant. Then, for the sub-paths of heavy edges we can perform a simple range-max query, while the light edges, the root, and $$$a_i$$$ itself can be handled separately. (Actually, it occurs to me now that this gives $$$O(\log n)$$$ per query provided an $$$O(1)$$$ RMQ structure is precomputed.)

      EDIT: The light edges, root, and $$$a_i$$$ can be processed in $$$O(1)$$$ each by by building an RMQ on the list of subtree heights and deepest descendants for each node, if that wasn't clear. (If construction time is a concern, only the root needs a full RMQ, at least when $$$k=2$$$. For other nodes, prefix and suffix maxes are enough.)

»
4 years ago, # |
Rev. 3   Vote: I like it +29 Vote: I do not like it

I'm not sure of the exact details of the solution (including if it works, lol), but by compiling the ideas from clyring and Everule, I would try the following:

First, let's take each query and use the virtual tree $$$VT$$$ to decompose it into $$$O(K)$$$ paths $$$P$$$ of type $$$(node, ancestor)$$$. There are two cases:

  1. The solution joins one of the paths $$$P$$$ from some child $$$c \not\in P$$$ of some node $$$v \in P$$$. The vertex candidate that you choose here is the one that maximizes some linear function $$$f(leaf) = depth(v) \cdot X + depth(leaf) \cdot k$$$ (plus some constant term that depends only on $$$P$$$), where $$$X$$$ is the difference between the nodes above path $$$P$$$ and the nodes below path $$$P$$$. You can do that by keeping the best function for each of the $$$2k$$$ possible evaluations of $$$X$$$ and merging upwards (also updating the queries with some range data structure)

  2. The solution joins the virtual tree $$$VT$$$ from above. This case is easier, just consider the farthest node from above $$$root(VT)$$$ using re-rooting technique and simple DP.

Total complexity of the solution should be something like $$$O((N + Q)Klog(N))$$$

  • »
    »
    4 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    This is quite close what I had in mind with my 'convexity trick' remark. There are only $$$k+1$$$ possible evaluations of $$$X$$$, but for large values of $$$k$$$ the best functions must be stored and calculated with some care on heavy paths to avoid MLE.

    • »
      »
      »
      4 years ago, # ^ |
      Rev. 2   Vote: I like it 0 Vote: I do not like it

      You are right, I updated my complexity to reflect that :).

      Maybe that can be optimized as well; however, the solution already seems complex enough for a competitive programming scenario as it is.

      EDIT: because the updates have descending slopes, you can use segment tree with convex hull trick in each node to do the updates in $$$O(log(n))$$$ amortized time and do the queries in $$$O(log(n) log(k))$$$

  • »
    »
    4 years ago, # ^ |
    Rev. 3   Vote: I like it +8 Vote: I do not like it

    I guess it makes sense to add that for $$$k = 2$$$, there are only two paths, and the solution is a lot easier to understand. Here the cases are:

    1. Below $$$u$$$ (or below $$$v$$$) (easy)
    2. On path $$$[u..lca(u, v)]$$$ (or on path $$$[v..lca(u, v)]$$$) (hard)
    3. Above $$$lca(u, v)$$$ (easy)

    The interesting candidates for each of the three cases can be computed separately. For case $$$2$$$, you do dfs. Take each edge $$$(parent, node)$$$ in leaf-to-root order and you update the candidate of each vertex in $$$subtree(node)$$$ (a compact interval) with the deepest leaf in $$$subtree(parent) \setminus subtree(node)$$$. When reaching the upper vertex of a "query path" (i.e. the lca), you just look at the value inside the lower vertices. The update-range-with-max and query-element can be easily done with a segment tree. (note that here $$$X = 0$$$, so you don't need to store one candidate for each possible difference).

    Most surely, the $$$log$$$ factor of the above solution can be optimized.