Recently when I was doing Universal Cup Round 5, I got stuck a tree problem A as I realized that my solution required way too much memory. However, after the contest, I realized that there was a way that I can reduce a lot of memory using HLD. So here I am with my idea...

# Structure of Tree DP

Most tree DP problems follow the following structure.

```
struct S {
// return value of DP
};
S init(int u) {
// initialise the base state of dp[u]
}
S merge(S left, S right) {
// returns the new dp state where old state is left and transition using right
}
S dp(int u, int p) {
S res = init(u);
for (int v : adj[u]) {
if (v == p) continue;
res = merge(res, dp(v, u));
}
return res;
}
int main() {
dp(1, -1);
}
```

An example of a tree DP using this structure is maximum independent set (MIS).

**Code**

Suppose struct $$$S$$$ requires $$$|S|$$$ bytes and our tree have N vertices. Then this naive implementation of tree DP requires $$$O(N\cdot |S|)$$$ memory as `res`

of the parent is stored in the recursion stack as we recurse down to the leaves. This is fine for many problems as most of the time, $$$|S| = O(1)$$$, however in the case of the above question, $$$|S| = 25^2\cdot 24$$$ bytes and $$$N = 10^5$$$, which will require around $$$1.5$$$ gigabytes of memory, which is too much to pass the memory limit of $$$256$$$ megabytes.

# Optimization

We try to make use of the idea of HLD and visit the visit the vertex with the largest subtree size first.

```
struct S {
// return value of DP
};
S init(int u) {
// initialise the base state of dp[u]
}
S merge(S left, S right) {
// returns the new dp state where old state is left and transition using right
}
int sub[MAXN];
void getSub(int u, int p) {
sub[u] = 1;
pair<int, int> heavy = {-1, -1};
for (int i = 0; i < adj[u].size(); i++) {
int v = adj[u][i];
if (v == p) continue;
getSub(v, u);
sub[u] += sub[v];
heavy = max(heavy, {sub[v], i});
}
// make the vertex with the largest subtree size the first
if (heavy.first != -1) {
swap(adj[u][0], adj[u][heavy.second]);
}
}
S dp(int u, int p) {
// do not initialize yet
S res;
bool hasInit = false;
for (int v : adj[u]) {
if (v == p) continue;
S tmp = dp(v, u);
if (!hasInit) {
res = init();
hasInit = true;
}
res = merge(res, tmp);
}
if (!hasInit) {
res = init();
hasInit = true;
}
return res;
}
int main() {
getSub(1, -1);
dp(1, -1);
}
```

If we analyze the memory complexity properly, we will realize that it becomes $$$O(N + |S|\log N)$$$. The $$$O(N)$$$ comes from storing the subtree sizes, and the $$$O(|S|\log N)$$$ comes from the DP itself.

## Proof

The two main changes from our naive DP is that we initialize our `res`

only after we visit the first child, and we visit the child with the largest subtree size first. Recalling the definitions used in HLD, we define an edge $$$p_u \rightarrow u$$$ to be a heavy edge if $$$u$$$ is the child with the largest subtree size among all children of $$$p_u$$$. We will call all other non-heavy edges as light edges. It is well known that the path from the root to any vertex of a tree will involve traversing many heavy edges but at most $$$O(\log N)$$$ light edges.

Making use of this idea, if we visit heavy edges first, `res`

of the parent of heavy edges will not be stored in the recursion stack since we only initialize our `res`

after visiting the first edge, hence only `res`

of the parent of light edges will be stored in the recursion stack. Then since there is at most $$$O(\log N)$$$ light edges, the memory complexity is $$$O(|S|\log N)$$$.

My solution to the above problem

# Conclusion

I have not seen this idea anywhere before and only came up with it recently during Universal Cup. Please tell me if it is actually a well-known technique or there are some flaws in my analysis. Hope that this will be helpful to everyone!

Auto comment: topic has been updated by maomao90 (previous revision, new revision, compare).This is one of the most beautiful techniques I've ever seen, hands down. I would have never thought you could do something like this.

Nice! But shouldn't we firstly go to the child, and only then initialise our base state?

Oops yes sorry my mistake

This technique is also mentioned in the editorial of 103119I - Nim Cheater, and it's the intended solution for this problem.

Also, I believe this should be the correct source code.

If it's bamboo, you will call

`init()`

for all nodes before returning from the leaf; therefore, it's O(n) memory only instead of O(logN) we are targeting.Oh wow I see thanks!

Is trivial == well-known?

Thanks a lot for sharing this with us. At first, it might look like it doesn't have much of a use case. However, it is actually helpful. For example, there was a tree DP problem in my second NOI (National OI). It was quite simple, but it required $$$O(nk)$$$ memory where $$$n \leq 10^5$$$, $$$k \leq 200$$$, and the memory limit was 32 MB, which is why the main problem was reducing memory usage. Even though other people managed to squeeze in wrong solutions because of bad test cases, I tried many different methods, but I couldn't get AC. If I knew this back then, I would've solved it easily and won a gold medal.

okaragul not giving a fuck and inventing this in like 30 seconds (In my defense I was the one lucky enough to teach him HLD in the first place, so I'm taking full credit)

Always wrote tree dp using hld, but never thought it actually optimized something

A maybe related blog post (as for solving problem A in ucup): https://codeforces.com/blog/entry/67001

In some sense, centroid decomposition can also help.

Why we are calling getSub() for the root only?

Oops i forgot to call the function inside the for loop. It was meant to be a dfs by the way

I think it's more-or-less intended solution of ICPC WF 2020 problem B

Isn't this trick was in IOI 2009 — Regions? Also as Alperent said in our national OI.

I think your

`getSub`

function doesn't really calculate subtree sizes :)Bruh I can't code DFS 🤡

I tried to solve the problem using choose(25,2) DFSs, and instead of the memory problem you faced, I got stuck not being able to make the code fast enough to pass the time limit. One trick I come up with is to re-index the nodes according to the pre-order of their tree. I noticed that my code is doing many parent-child calculations and if the indexes are not randomly assigned, some time can be saved.

TLE code

AC code

in short, i added the following: