Optimizing Round 595 Problem F from $$$O(n^4)$$$ to $$$O(n)$$$

#	User	Rating
1	ecnerwala	3648
2	Benq	3580
3	orzdevinwang	3570
4	cnnfls_csy	3569
5	Geothermal	3568
6	tourist	3565
7	maroonrk	3530
8	Radewoosh	3520
9	Um_nik	3481
10	jiangly	3467

#	User	Contrib.
1	maomao90	174
2	adamant	164
2	awoo	164
4	TheScrasse	160
5	nor	159
6	maroonrk	156
7	-is-this-fft-	150
8	SecondThread	147
9	orz	146
10	pajenegod	145

Since there was a fair amount of interest, I'll explain here how I optimized my solution to 1249F - Maximum Weight Subset from $$$\mathcal{O}(n^4)$$$ to $$$\mathcal{O}(n)$$$. One cool feature of this problem is that it's possible to get from $$$\mathcal{O}(n^4)$$$ all the way down to $$$\mathcal{O}(n)$$$ with fundamentally the same solution idea. This is unlike many other problems, where if you want to improve the runtime further you have to completely rethink the solution from scratch (e.g., switch to greedy rather than DP). Thanks MikeMirzayanov for writing a nice problem.

$$$\mathcal{O}(n^4)$$$

I'll briefly go over the inefficient solutions first. My initial idea was to compute for each node a two-dimensional DP: dp[n][d] = the maximum weight to choose n nodes in a valid manner within our subtree such that the node closest to the root of the subtree has depth d. Then when combining two subtrees, we need to add the node counts together and take the minimum of the two depths to get our new DP state. Code here: 63153425. Pay attention to the attach function in particular.

This looks like $$$\mathcal{O}(n^5)$$$, but it's actually $$$\mathcal{O}(n^4)$$$ due to this observation. I wasn't sure at first if this would manage to pass the time limit, but it turns out the worst case is only around $$$100^4$$$ with good constant factor, rather than $$$200^4$$$.

$$$\mathcal{O}(n^3)$$$

To optimize the above solution, notice that the nested inner loops on x and y can be optimized to linear time by computing suffix minimums of the original DP arrays. We consider two cases for min(x, y), one where x = min(x, y) (in which case we need both y >= x and x + y >= K, meaning y >= max(K - x, x)) and another case where y = min(x, y) (with similar constraints on x). Code: 63200545. Note that the innermost loops perform the equivalent of the commented-out loops on x and y, with complexity improved by a factor of $$$n$$$.

A better $$$\mathcal{O}(n^3)$$$

How can we do better? Well, it would be helpful if we realized that the n for node count in our DP state is completely useless. Let's get rid of it. Now our solution looks like this instead: 63200548. A much cleaner $$$\mathcal{O}(n^3)$$$.

[Correction: this is actually $$$\mathcal{O}(n^2)$$$ due to the same observation as above.]

$$$\mathcal{O}(n^2)$$$

Here we apply the same suffix max optimization again. Code: 63200549. We've now reached the best solution described in the editorial.

$$$\mathcal{O}(n \log n)$$$

How can we improve further? Let's take a look at the quadratic solution again and analyze which parts are causing it to be quadratic. First the algorithm requires inserting into the beginning of a vector; this is easy to fix by swapping out std::vector for a std::deque and using push_front(). On top of that, the inner loops to combine two subtrees clearly cause the algorithm to be quadratic.

However, note that we can rewrite the attach function to run in time proportional to min(root.size(), child.size()) rather than root.size() + child.size(). In particular, the code to combine the two subtrees is completely symmetric with respect to root and child, so we can swap if necessary in order to assume root.size() >= child.size(). We can then combine the two subtrees in O(child.size()) by

saving the suffix maxes in the DP state so we don't have to recompute them and
realizing that we can limit the loops on both x and y to go up to child.size() only, since any indices larger than that will not get modified for our combined DP state.

Thus, after being careful to pass everything by reference to avoid redundant copying, our new algorithm is $$$\mathcal{O}(n \log n)$$$, since every DP state can only be merged from smaller into larger $$$\log n$$$ times each. Code: 63206899. For an easier implementation, note that we can just use the suffix maxes as the DP state itself; in other words, we redefine the DP state to be dp[d] = the maximum weight we can get from validly choosing nodes in our subtree such that all chosen nodes are *at least* depth d from the subtree root.

$$$\mathcal{O}(n)$$$

But our time bound on our algorithm can be improved -- it's actually $$$\mathcal{O}(n)$$$. How come? This is due to the fact that instead of having one DP state per node in our subtree, we have one DP state per distinct depth in our subtree. In particular, this means that merging a smaller state into a larger state results in an output of the same size as the larger state, rather than the sum of the two sizes (like in the usual smaller-to-larger case). So when we merge smaller into larger, you can think of it instead as all of the smaller state values getting 'eaten' by the larger state, in O(1) time per value. Since each value can only get eaten once, we can finally conclude that our algorithm is $$$\mathcal{O}(n)$$$.