Complexity of Merging Subsets on Trees

Last weekend (March 1-2) the 2014 HackerRank 101 Hack took place. In particular, a problem named Coloring Tree was present. Basically, the problem gives you a tree where each vertex is assigned a color and asks you queries where you have to find the number of distinct colors in a particular subtree.

My approach was to compute the number of distinct colors for every subtree using a DFS traverse of the tree. My algorithm went through the tree, and for every vertex it computed a set (std::set) of all the distinct elements within its subtree. I accomplished this for every vertex using the following steps:

Compute the sets for each of the children of the current vertex.
Merge the children together by inserting all elements of the child sets within the subtree into the set of the largest child. (NOTE: Do not copy the set of the largest child. Simply insert the rest of the elements such that the number of insertions is #_OF_ELEMENTS_IN_SUBTREE-#_OF_ELEMENTS_IN_LARGEST_CHILD).
Record the size of the merged set.

I accomplished this in my code through the use of an array of sets, and keeping pointers to the "active set" of a vertex, thus avoiding copying the largest child.

Here is my code on Ideone: http://ideone.com/Lje1R5

Now, onto my actual question. The reason I used this technique is because I remembered a similar technique from a past USACO problem. The USACO problem explained that the complexity of such a merging is O(nlog²n), if I understood it correctly. However, it did not provide a proof of the complexity. I've tried to deduce it on my own with induction, but am stuck on analyzing the merge step, since it involves the size of the largest subtree. I thought of maybe using something like Jensen's inequality to bound the sum of the subtrees, but wasn't able to get very far do too my inexperience with Jensen's. I am pretty sure the complexity is around O(nlog²n) because of my program's runtime: Runtimes (unless they had extremely weak test data?) My question is, how can such a complexity be deduced?

Comments (16)

Write comment?

kingofnumbers

10 years ago, # |

a move consist of removing an element from set and inserting it in another set

let's prove that there's at most O(N log N) moves, each move can be done in O(log N) so total complexity is O(N log^2 N)

when you are merging two sets you move elements from smaller set(assume its size is K) to bigger one, so every element in the smaller set now is in a set with at least size 2K

in other words every element that has been moved T times is in a set with at least size 2^T , thus every element will be moved at most O(log N) and we have N elements so in total there will be at most O(N log N) move operations.

→ Reply

ed1d1a8d

10 years ago, # ^ |

Pretty neat. That technique seems like its pretty useful. Thanks for the explanation.

niklasb

Very nice argument.

← Rev. 2 →

I just realized this argument even holds in DAGs. You just have to use persistent search trees there

Hm, what if we remove duplicates in the sets, for example if we only consider distinct colors in the subtree or something similar? Then if an element is moved, we can't guarantee that it's new home is twice as large as its old home. I think we need to also consider that if this happens, the removed elements can no longer be moved anywhere, but it seems much more complicated then. Is there a simple solution?

Just treat it as the same as the old problem. Imagine that we didn't remove the duplicate elements, so that the complexity stays the same. Querying the elements will still work in the exact same way.

← Rev. 3 →

I'm sure the bound holds, I'm just not clear about this argument of why it holds. Because if we remove duplicates, it will change the order in which we merge subtrees.

I see what you mean though. Instead of removing duplicates, we would just keep track of how many duplicates we have, which would also allow us to count the number of distinct elements. That's not what we do though. It seems clear that we can just do better than that by actually removing dups, but the proof doesn't work that way

← Rev. 8 →

In particular I noticed that according to kingofnumbers' logic, we could do the exact same thing on DAG's just with an additional n factor. But I don't think this is possible, because there not removing duplicates would result in exponential growth, since nodes might be reachable via multiple paths.

atrista

9 years ago, # ^ |

Does the argument holds for a chain graph like: 1->2->3->4->...->N and each node having a distinct color?

_Bishop_

4 years ago, # ^ |

-15

The memory is still $$$O(n^2)$$$ ...As I can see we have $$$map<ll,ll>$$$ for each child...Even if we insert elements of smaller sized sets into larger ones, we are assigning the larger one to the parent. So in this way each node stores the maps of size equal to the subtree size.

aneesh2312

Time complexity is always at least as much as memory. Since here time complexity is O(nlog^2n), so is memory .

I think I understood.. The memory only increases by $$$O(1)$$$ when an element moves from smaller to larger set(because it is copied).. and since an element moves at most $$$logN$$$ times so it contributes $$$logN$$$ increase to the memory ... so overall $$$O(NlogN)$$$ memory