#	User	Rating
1	ecnerwala	3649
2	Benq	3581
3	orzdevinwang	3570
4	Geothermal	3569
4	cnnfls_csy	3569
6	tourist	3565
7	maroonrk	3531
8	Radewoosh	3521
9	Um_nik	3482
10	jiangly	3468

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	162
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	151
8	SecondThread	147
9	orz	146
10	pajenegod	145

Introduction

Hello Codeforces,
I came across this problem a while back — https://cses.fi/problemset/task/1139/. Here the task is — "Given a colored, rooted tree, determine the number of distinct colors in each subtree of the tree". It is possible to solve this problem by flattening the tree and then using a binary indexed tree(now that subtrees have converted into subarrays). I will discuss an alternate technique to solve this, which can help in cases where it is useful to have the entire subtree as a set in a DFS call.

$$$O(n^2)$$$ solution

If $$$n$$$ were small, we could have stored for each node, the set of elements in its subtree. A DFS can be used to evaluate this, which will take $$$O(n^2log n)$$$ time and $$$O(n^2)$$$ space.

vector<int> color(n);
vector<set<int>> subtree(n);

void dfs(int i,int parent){
	subtree[i].insert(color[i]);
	for(int node : graph[i]){
		if(node != parent){
			dfs(node,i);
			subtree[i].insert(subtree[node].begin(),subtree[node].end());
		}
	}
}

The number of distinct colors in the subtree of the $$$i^{\text{th}}$$$ node is just the size of the set corresponding to it.

An Improvement

The problem with this technique is repetition. Each element occurs in $$$depth[i]$$$ different sets which is inefficient. For an improvement, we can consider the same idea used in heavy-light decomposition and union-find optimizations, which is "small-to-large merging".

We will use a DFS as before. To construct the set of all elements in the node's subtree, we will pick the child with the largest subtree set. Instead of making a new set for the current node, we will use the same set as the child with the largest subtree and insert all other children's sets in it. To avoid consuming extra space and time, we will use pointers to accomplish this task.

vector<int> color,distinct;
vector<set<int>*> subtree;
 
void dfs(int i,int parent = -1){
	int largest = -1;
	vector<int> children;
	for(int node : graph[i]){
		if(node != parent){
			dfs(node,i);
			children.push_back(node);
			if(largest == -1 || subtree[largest]->size() < subtree[node]->size()){
				largest = node;
			}
		}
	}
	if(largest == -1){
		subtree[i] = new set<int>; // new set for leaf node
	}
	else{
		subtree[i] = subtree[largest]; // largest sized child
	}
	
	for(int child : children){
		if(child == largest)continue;
		subtree[i]->insert(subtree[child]->begin(),subtree[child]->end());
	}
	subtree[i]->insert(color[i]);
	distinct[i] = subtree[i]->size();
}

Time complexity

Every time we copy an element over, the set it is now in will be at least 2 times larger than the set it was previously in. Hence the time complexity is $$$O(n log^2 n)$$$ (or $$$O(n log n)$$$ if we avoid using sets).

Thanks to -is-this-fft- for this

Conclusion

Here is my solution for the CSES task — https://cses.fi/paste/e75f5efc2e1cd2fc3c8a66/

This technique can be used in various other tree problems. Examples of such problems could be to find the most frequent element in each node's subtree or to find the pair with minimum difference in each node's subtree.

Comments (17)

Show archived | Write comment?

anujdhillxn

23 months ago, # |

← Rev. 3 →

Can you share your Fenwick Tree approach?

BTW, the swap function makes it extremely easy to implement small-to-large. Submission

→ Reply

shadow9236

23 months ago, # ^ |

Consider the problem to be querying for all subtrees. Flatten the tree. Now subtree queries can be treated as subarray queries. Sort all queries about their ending point. Now iterate from left to right in the array. In the binary indexed tree, set $$$i$$$ to 1 if $$$i$$$ is the current rightmost occurence of $$$a[i]$$$ and 0 otherwise. The answer for a query ending at $$$i$$$ is just the rangesum of it in the binary indexed tree.

This works because we are supposed to count exactly one occurence of each element in a range query. We have counted the rightmost occurence of each element and also sorted the queries about their right points.

Submission using fenwick tree — https://cses.fi/paste/e5bcaa36945e6b663c8bb1/

Your submission is not visible, you will have to go to "share this code to others" option

Fixed.

18o3

+13

I guess that this is similar to small to large merging on tree . Arpa has a blog about this where he refers to it as Sack on tree.

-is-this-fft-

+36

This is precisely small-to-large merging.

Personally, I really dislike the Arpa blog because it kinda doesn't explain anything and instead just consists of like 15 different code samples that obscure the idea more than anything.

Arpa

Agreed. But I remember that someone wrote a blog that explained the technique. It must be preset in blog comments.

nikgaevoy

← Rev. 2 →

+29

First of all, a nice additional problem on this topic: The Cost of Speed Limits from ICPC 2020.

As for the blog, the code could be simplified greatly. Firstly, you don't need to compare children by the sizes of the answer, the sizes of their subtrees also works because of the exact same reasons. So, we can precompute largest[i] as the children with the largest subtree. Additionally, you don't need to use raw pointers and similar stuff, std::move works like a charm. And, since you don't store sets explicitly anyway, there is no need in std::move as well, instead, we can rely on RVO.

So, the code would look like this

set<int> dfs(int i, int parent = -1)
{
    auto ans = largest[i] == -1 ? set<int>{} : dfs(largest[i], i);

    for (auto it : graph[i])
        if (it != parent && it != largest[i])
        {
            auto tmp = dfs(it, i);

            ans.insert(tmp.begin(), tmp.end());
        }

    ans.insert(color[i]);
    distinct[i] = ans.size():

    return ans;
}

UPD. I submitted the code to the system, it works.

acraider

+17

Great short Blog.

I created a video on the same technique sometime back : https://www.youtube.com/watch?v=92unAh2APJ0

Adding your blog to the description :D

karthikeya09

10 months ago, # |

"Every time we copy an element over, the set it is now in will be at least 2 times larger than the set it was previously in"

This is not true right? Say set_1 = {1,2} and set_2 = {1,2,3}, size of the merged set is < 4. I think merging based on size of the subtree is better

tautology

9 months ago, # ^ |

usaco too has it wrong

pranav.02

600-E Lomsat gelral also uses this approach.Submission.

9 months ago, # |

Can someone explain why this implementation not causing MLE? since space complexity can go O(n^2)

alaa.A

2 months ago, # ^ |

The space complexity is O(nlogn). It is almost the same as time complexity because we only use memory when we are merging. So check out the above proof for the merging time complexity.