### shivangtiwari's blog

By shivangtiwari, history, 6 weeks ago,

# Introduction

Hello Codeforces,
I came across this problem a while back — https://cses.fi/problemset/task/1139/. Here the task is — "Given a colored, rooted tree, determine the number of distinct colors in each subtree of the tree". It is possible to solve this problem by flattening the tree and then using a binary indexed tree(now that subtrees have converted into subarrays). I will discuss an alternate technique to solve this, which can help in cases where it is useful to have the entire subtree as a set in a DFS call.

# $O(n^2)$ solution

If $n$ were small, we could have stored for each node, the set of elements in its subtree. A DFS can be used to evaluate this, which will take $O(n^2log n)$ time and $O(n^2)$ space.

vector<int> color(n);
vector<set<int>> subtree(n);

void dfs(int i,int parent){
subtree[i].insert(color[i]);
for(int node : graph[i]){
if(node != parent){
dfs(node,i);
subtree[i].insert(subtree[node].begin(),subtree[node].end());
}
}
}


The number of distinct colors in the subtree of the $i^{\text{th}}$ node is just the size of the set corresponding to it.

# An Improvement

The problem with this technique is repetition. Each element occurs in $depth[i]$ different sets which is inefficient. For an improvement, we can consider the same idea used in heavy-light decomposition and union-find optimizations, which is "small-to-large merging".

We will use a DFS as before. To construct the set of all elements in the node's subtree, we will pick the child with the largest subtree set. Instead of making a new set for the current node, we will use the same set as the child with the largest subtree and insert all other children's sets in it. To avoid consuming extra space and time, we will use pointers to accomplish this task.

vector<int> color,distinct;
vector<set<int>*> subtree;

void dfs(int i,int parent = -1){
int largest = -1;
vector<int> children;
for(int node : graph[i]){
if(node != parent){
dfs(node,i);
children.push_back(node);
if(largest == -1 || subtree[largest]->size() < subtree[node]->size()){
largest = node;
}
}
}
if(largest == -1){
subtree[i] = new set<int>; // new set for leaf node
}
else{
subtree[i] = subtree[largest]; // largest sized child
}

for(int child : children){
if(child == largest)continue;
subtree[i]->insert(subtree[child]->begin(),subtree[child]->end());
}
subtree[i]->insert(color[i]);
distinct[i] = subtree[i]->size();
}


# Time complexity

Every time we copy an element over, the set it is now in will be at least 2 times larger than the set it was previously in. Hence the time complexity is $O(n log^2 n)$ (or $O(n log n)$ if we avoid using sets).

Thanks to -is-this-fft- for this

# Conclusion

Here is my solution for the CSES task — https://cses.fi/paste/e75f5efc2e1cd2fc3c8a66/

This technique can be used in various other tree problems. Examples of such problems could be to find the most frequent element in each node's subtree or to find the pair with minimum difference in each node's subtree.

• +70

 » 6 weeks ago, # | ← Rev. 3 →   +3 Can you share your Fenwick Tree approach?BTW, the swap function makes it extremely easy to implement small-to-large. Submission
•  » » 6 weeks ago, # ^ | ← Rev. 3 →   +3 Consider the problem to be querying for all subtrees. Flatten the tree. Now subtree queries can be treated as subarray queries. Sort all queries about their ending point. Now iterate from left to right in the array. In the binary indexed tree, set $i$ to 1 if $i$ is the current rightmost occurence of $a[i]$ and 0 otherwise. The answer for a query ending at $i$ is just the rangesum of it in the binary indexed tree.This works because we are supposed to count exactly one occurence of each element in a range query. We have counted the rightmost occurence of each element and also sorted the queries about their right points.Submission using fenwick tree — https://cses.fi/paste/e5bcaa36945e6b663c8bb1/Your submission is not visible, you will have to go to "share this code to others" option
•  » » » 6 weeks ago, # ^ |   0 Fixed.
 » 6 weeks ago, # |   +13 I guess that this is similar to small to large merging on tree . Arpa has a blog about this where he refers to it as Sack on tree.
•  » » 6 weeks ago, # ^ |   +36 This is precisely small-to-large merging.Personally, I really dislike the Arpa blog because it kinda doesn't explain anything and instead just consists of like 15 different code samples that obscure the idea more than anything.
•  » » » 6 weeks ago, # ^ |   0 Agreed. But I remember that someone wrote a blog that explained the technique. It must be preset in blog comments.
 » 6 weeks ago, # | ← Rev. 2 →   +29 First of all, a nice additional problem on this topic: The Cost of Speed Limits from ICPC 2020.As for the blog, the code could be simplified greatly. Firstly, you don't need to compare children by the sizes of the answer, the sizes of their subtrees also works because of the exact same reasons. So, we can precompute largest[i] as the children with the largest subtree. Additionally, you don't need to use raw pointers and similar stuff, std::move works like a charm. And, since you don't store sets explicitly anyway, there is no need in std::move as well, instead, we can rely on RVO. So, the code would look like thisset dfs(int i, int parent = -1) { auto ans = largest[i] == -1 ? set{} : dfs(largest[i], i); for (auto it : graph[i]) if (it != parent && it != largest[i]) { auto tmp = dfs(it, i); ans.insert(tmp.begin(), tmp.end()); } ans.insert(color[i]); distinct[i] = ans.size(): return ans; } UPD. I submitted the code to the system, it works.
 » 6 weeks ago, # |   +17 Great short Blog. I created a video on the same technique sometime back : https://www.youtube.com/watch?v=92unAh2APJ0Adding your blog to the description :D
 » 6 weeks ago, # |   0 Same code without pointers using swap to avoid TLE: https://cses.fi/paste/01ea1fa1bcc55dd43c9d4f/