### Epilogue's blog

By Epilogue, history, 2 months ago,

Hi!. I'm solving a problem. Given a tree with N nodes with colors and q queries. For each query given two integers (u, v), answer if the path(u, v) on the tree is dominant. A path is called dominant when the most frequently occuring element appears strictly more than pathlength / 2.

Are there any offline, online solutions for this? If you have any ideas please comment bellow.

• +10

 » 2 months ago, # |   -28 Did you try to google before asking this? If you did, you would atleast know that it can be solved easily offline with Mo's.
•  » » 2 months ago, # ^ | ← Rev. 2 →   0 I know Mo for the subtree one but this one is path query. Can you using MO with it or Did you even read the problem properly?
•  » » » 2 months ago, # ^ |   -16 Did you even read the problem properly? Yes, I read your problem properly. You however, still didn't read my comment properly. A simple google search yields this.
•  » » » » 2 months ago, # ^ |   0 Oh I saw that one before but the constraint is like 5*10^5 so.Anyways tks for reminding me that.
 » 2 months ago, # | ← Rev. 6 →   +4 Arbitrarily root the tree. Let $cnt(i, j)$ denote the frequency of color $j$ on the path from root to node $i$. Now the number of occurrences of a color $x$ in $path(u, v)$ is $cnt(u, x) + cnt(v, x) - 2 * cnt(lca(u, v), x) + (color[lca(u, v)] == x)$. A path is dominant if the number of occurrences of the median color in the path is strictly greater than $\frac{pathlength}{2}$. $cnt(i, j)$ can be calculated online in $\mathcal{O}(\log N)$ with the help of persistent segment tree and the median color in a path can be calculated by binary search with range sum queries on the same tree. Overall time complexity will be $\mathcal{O}(N \log N + Q \log^2 N)$ and memory complexity $\mathcal{O}(N \log N)$.The problem can also be solved in $\mathcal{O}((N + Q) \log N)$ by an approach similar to walking on a segment tree once you have the persistent segment trees containing information on the frequency of colors on the path to root for each node. It can also be solved by a binary lifting approach using the fact that the majority of a sequence must be the majority of one of the sequences when it is partitioned into two.
•  » » 2 months ago, # ^ |   0 Wow that's nice. I have thought about Persistent Segment Tree but I didn't think about finding the median on the path. Thanks.
•  » » 2 months ago, # ^ |   0 Actually you can get rid of the binary search and just search the segtree going to the child node with the maximum sum on each step, so it'll become $O(NlogN + QlogN)$.
•  » » » 2 months ago, # ^ | ← Rev. 3 →   +3 You cannot do that. The segment tree of node $i$ contains the information of the frequencies of colors on the path from root to node $i$. To get the frequency of colors on $path(u, v)$ you need to combine the information of three segment trees. On a step in binary search you need to check the sum of frequencies of colors less than mid. The median will be the largest element for which such sum is less than $\frac{pathlength}{2}$.
•  » » » » 2 months ago, # ^ |   +3 You can keep the 3 segtree roots with you and go to their right or left children on each step of the search.
•  » » » » » 2 months ago, # ^ |   0 Yeah that works.
 » 2 months ago, # | ← Rev. 5 →   +2 Two approaches I can think of. Approach 1First, think about how you can count the occurrences of a certain value in a path. (Hint: it's not much harder than finding the distance) Second, come up with ways to sample a random vertex from any simple path of a tree. (Hint: binary lifting.) Afterwards, randomize. Should be solvable in $O(N \log N+cQ\log N)$, $c$ being the number of trials you will sample values from the path. As the appropriate value for AC would be $c = O(\log N)$, this is roughly $O(N \log N + Q \log ^2 N)$ without too many complex algorithms. Approach 2Modify the Boyer-Moore Majority Vote algorithm to work on a Segment Tree. Afterwards it will be straightforward to write a $O(N \log N + Q \log ^2 N)$ solution based on Heavy-Light Decomposition.
•  » » 2 months ago, # ^ |   0 You don’t need HLD, you can just apply Approach 2 as a ‘binary lifting’-like dp and check the candidate in $O(\log n)$, making it $O(q \log n)$
•  » » » 2 months ago, # ^ |   0 But how would you adapt the Boyer-Moore Majority Vote on the binary-lifting like method? Can we just try two possible candidates for the majority using the fact (which I have not proven, but does seem trivial) that if an element is a majority of $A + B$ it must either be a majority of $A$ or a majority of $B$?
•  » » » » 2 months ago, # ^ |   0 The fact that you mentioned is indeed true, it’s not too hard to prove. However, you can just aggregate the results on the two independent paths with the same combining function.
•  » » » » 2 months ago, # ^ |   +5 Proof for "if an element is a majority of $A + B$ it must either be a majority of $A$ or a majority of $B$" : $\newline$ Suppose both $A$ and $B$ have length of $2n + 1$. Then let $X$ and $Y$ be the number of majority for $A$ and $B$ respectively each with count of $n + 1$. For the majority of $A + B$ to be some element other than $X$ or $Y$, such element will need to have a count greater than that of $X + Y$. The maximum count we can get for any element other than $X$ or $Y$ is $|A + B| - cnt(X) - cnt(Y) = 4n + 2 - 2(n + 1) = 2n$. Since $2n < 2(n + 1)$ then it is impossible for any number other than $X$ or $Y$ to be the majority of $A + B$. $\newline$ Explanation for "how would you adapt the Boyer-Moore Majority Vote on the binary-lifting like method" : $\newline$ let $lft(x, y)$ be the resulting node after lifting $x$ by $y$, and let $dp_{i,j}$ be a pair $(cnt, val)$ that stores the answer after applying Boyer-Moore on all nodes on the path from $lft(i, 2^j)$ to $i$. updating the $dp$ is as follows : $\newline$ $dp_{i,0} = (1, val_i) \newline$ $(dp_{i,j} \neq dp_{lft(i, 2^j),j}) \implies dp_{i,j+1} = max(dp_{i,j}, dp_{lft(i, 2^j),j}) \newline$ $otherwise \implies dp_{i, j+1} = (dp_{i,j}.first + dp_{lft(i, 2^j),j}.first, dp_{i,j}.second) \newline$ This $dp$ can be preprocessed in a similar manner to $lft$. Now lets consider each query in the form : "is $path(x, y)$ dominant?", you can get Boyer-Moore for $path(x, lca(x, y))$ in using binary lifting on $dp$ in a similar manner to getting $lca(x, y)$. The answer is simply to check if the value of Boyer-Moore on $path(x, y)$ is the majority or not.
 » 2 months ago, # |   0 I solved it using Persistent Segment Tree, I think will try some others approach too.
 » 2 months ago, # |   +3 Maybe you can randomly choose some vertexs and count the occurance of it. If you randomly choose $x$ times, the error rate will be $\dfrac{1}{2^x}$, so $x$ be about $O(\log V)$ is enough.
•  » » 2 months ago, # ^ |   0 thats so cooolll
 » 2 months ago, # | ← Rev. 3 →   0 First find a candidate for the most common color by calculating a pair (color, score) with a HLD path query. To combine pairs its easy. $(c1,s1)+(c2,s2)$ is just $(c1,s1+s2)$ if $c1=c2$ and $(c1,s1-s2)$ if $c1!=c2$ and $s1>s2$. Otherwise its $(c2, s2-s1)$(The cool property of this operation is that, if there is a dominant color then the result of the operation has that color.Then use another path query to compute real amount of appearances of that color along path (also HLD).This way you can do online queries with updates.It's probably overkill but maybe it gives some ideas
 » 2 months ago, # | ← Rev. 2 →   0 Another ideaIf the path is short (L < 2sqrt(N)) we can just check it in linear time using a hashmap or something.If the path is long (L > 2sqrt(N)) the color must have at least L/2 appearances on the path, so it must have at least sqrt(N) appearances on the tree. This means that there will not be many colors that have this property and we can try and check all of them using something like binary lifting idk.This works for online queries but no update s
•  » » 2 months ago, # ^ |   -8 I don't see that updates would be mentioned in post, so I think your solution can be further simplified:We can have at most sqrt colours with freq >= sqrt.For each we can build array d[c][u] = how many nodes have colour 'c' on path from u to root. Then calculating frequency of c on path from u to v is trivial in O(1), path_len >= 2sqrt can be solved in O(sqrt) per query. The other case with small length can be checked in O(length) as you said.*(Note, it works only if ML allows NsqrtN memory)
 » 2 months ago, # |   0 Epilogue problem link plz..