By shadow9236, history, 23 months ago,

# Introduction

Hello Codeforces,
I came across this problem a while back — https://cses.fi/problemset/task/1139/. Here the task is — "Given a colored, rooted tree, determine the number of distinct colors in each subtree of the tree". It is possible to solve this problem by flattening the tree and then using a binary indexed tree(now that subtrees have converted into subarrays). I will discuss an alternate technique to solve this, which can help in cases where it is useful to have the entire subtree as a set in a DFS call.

# $O(n^2)$ solution

If $n$ were small, we could have stored for each node, the set of elements in its subtree. A DFS can be used to evaluate this, which will take $O(n^2log n)$ time and $O(n^2)$ space.

vector<int> color(n);
vector<set<int>> subtree(n);

void dfs(int i,int parent){
subtree[i].insert(color[i]);
for(int node : graph[i]){
if(node != parent){
dfs(node,i);
subtree[i].insert(subtree[node].begin(),subtree[node].end());
}
}
}


The number of distinct colors in the subtree of the $i^{\text{th}}$ node is just the size of the set corresponding to it.

# An Improvement

The problem with this technique is repetition. Each element occurs in $depth[i]$ different sets which is inefficient. For an improvement, we can consider the same idea used in heavy-light decomposition and union-find optimizations, which is "small-to-large merging".

We will use a DFS as before. To construct the set of all elements in the node's subtree, we will pick the child with the largest subtree set. Instead of making a new set for the current node, we will use the same set as the child with the largest subtree and insert all other children's sets in it. To avoid consuming extra space and time, we will use pointers to accomplish this task.

vector<int> color,distinct;
vector<set<int>*> subtree;

void dfs(int i,int parent = -1){
int largest = -1;
vector<int> children;
for(int node : graph[i]){
if(node != parent){
dfs(node,i);
children.push_back(node);
if(largest == -1 || subtree[largest]->size() < subtree[node]->size()){
largest = node;
}
}
}
if(largest == -1){
subtree[i] = new set<int>; // new set for leaf node
}
else{
subtree[i] = subtree[largest]; // largest sized child
}

for(int child : children){
if(child == largest)continue;
subtree[i]->insert(subtree[child]->begin(),subtree[child]->end());
}
subtree[i]->insert(color[i]);
distinct[i] = subtree[i]->size();
}


# Time complexity

Every time we copy an element over, the set it is now in will be at least 2 times larger than the set it was previously in. Hence the time complexity is $O(n log^2 n)$ (or $O(n log n)$ if we avoid using sets).

Thanks to -is-this-fft- for this

# Conclusion

Here is my solution for the CSES task — https://cses.fi/paste/e75f5efc2e1cd2fc3c8a66/

This technique can be used in various other tree problems. Examples of such problems could be to find the most frequent element in each node's subtree or to find the pair with minimum difference in each node's subtree.

• +78

| Write comment?
 » 23 months ago, # | ← Rev. 3 →   +3 Can you share your Fenwick Tree approach?BTW, the swap function makes it extremely easy to implement small-to-large. Submission
•  » » 23 months ago, # ^ | ← Rev. 3 →   +3 Consider the problem to be querying for all subtrees. Flatten the tree. Now subtree queries can be treated as subarray queries. Sort all queries about their ending point. Now iterate from left to right in the array. In the binary indexed tree, set $i$ to 1 if $i$ is the current rightmost occurence of $a[i]$ and 0 otherwise. The answer for a query ending at $i$ is just the rangesum of it in the binary indexed tree.This works because we are supposed to count exactly one occurence of each element in a range query. We have counted the rightmost occurence of each element and also sorted the queries about their right points.Submission using fenwick tree — https://cses.fi/paste/e5bcaa36945e6b663c8bb1/Your submission is not visible, you will have to go to "share this code to others" option
•  » » » 23 months ago, # ^ |   0 Fixed.
 » 23 months ago, # |   +13 I guess that this is similar to small to large merging on tree . Arpa has a blog about this where he refers to it as Sack on tree.
•  » » 23 months ago, # ^ |   +36 This is precisely small-to-large merging.Personally, I really dislike the Arpa blog because it kinda doesn't explain anything and instead just consists of like 15 different code samples that obscure the idea more than anything.
•  » » » 23 months ago, # ^ |   0 Agreed. But I remember that someone wrote a blog that explained the technique. It must be preset in blog comments.
 » 23 months ago, # | ← Rev. 2 →   +29 First of all, a nice additional problem on this topic: The Cost of Speed Limits from ICPC 2020.As for the blog, the code could be simplified greatly. Firstly, you don't need to compare children by the sizes of the answer, the sizes of their subtrees also works because of the exact same reasons. So, we can precompute largest[i] as the children with the largest subtree. Additionally, you don't need to use raw pointers and similar stuff, std::move works like a charm. And, since you don't store sets explicitly anyway, there is no need in std::move as well, instead, we can rely on RVO. So, the code would look like thisset dfs(int i, int parent = -1) { auto ans = largest[i] == -1 ? set{} : dfs(largest[i], i); for (auto it : graph[i]) if (it != parent && it != largest[i]) { auto tmp = dfs(it, i); ans.insert(tmp.begin(), tmp.end()); } ans.insert(color[i]); distinct[i] = ans.size(): return ans; } UPD. I submitted the code to the system, it works.
 » 23 months ago, # |   +17 Great short Blog. I created a video on the same technique sometime back : https://www.youtube.com/watch?v=92unAh2APJ0Adding your blog to the description :D
 » 10 months ago, # | ← Rev. 3 →   0 "Every time we copy an element over, the set it is now in will be at least 2 times larger than the set it was previously in"This is not true right? Say set_1 = {1,2} and set_2 = {1,2,3}, size of the merged set is < 4. I think merging based on size of the subtree is better
•  » » 8 months ago, # ^ |   0
 » 10 months ago, # |   +9 600-E Lomsat gelral also uses this approach.Submission.
 » 8 months ago, # |   0 Can someone explain why this implementation not causing MLE? since space complexity can go O(n^2)
•  » » 2 months ago, # ^ |   0 The space complexity is O(nlogn). It is almost the same as time complexity because we only use memory when we are merging. So check out the above proof for the merging time complexity.
 » 2 months ago, # |   0 Another good problem that requires this technique. Here
•  » » 4 weeks ago, # ^ |   0 I landed on this blog while solving the same problem :D
•  » » » 4 weeks ago, # ^ |   +1 same here.lol
 » 4 weeks ago, # |   0 For upcoming readers, if you cannot comprehend how the time complexity is O(NlogN^2) then this video might help