Here's the usual shameless self-advertisement!

Talking with my colleagues a few months ago, I noticed that except for the editorial of Aliens (IOI 2016), there isn't any comprehensive resource in learning the DP optimization (well, maybe except for this one, but... yeah), so I decided to write a more extensive tutorial on it. Cheers! ^^

There is this article actually.

P.S. I wonder if there's going to be any formal work on the connection of this trick with Lagrange multipliers...

Well... yes, I know the blog post, but I think there is room for quite a few more details. As for the Lagrange multipliers, now that you say it, that might just be the inspiration for the trick :))

This trick relies on a simple fact sometimes known as the "Lagrange sufficiency theorem":

Let $$$X$$$ be a set, $$$f : X \to \mathbb R$$$ a 'value function', $$$h : X \to \mathbb R$$$ a 'constraint function', and $$$b \in \mathbb R$$$ our specific constraint. Suppose $$$\lambda \in \mathbb R$$$ and $$$x \in X$$$ are such that $$$x$$$ maximises $$$L(x,\lambda) = f(x) - \lambda(h(x)-b)$$$. Moreover, suppose $$$h(x)=b$$$. Then $$$x$$$ maximises $$$f(x)$$$ subject to the constraint $$$h(x)=b$$$.

(The proof is trivial: just observe that $$$L = f$$$ when $$$h(x)=b$$$.)

In the dp problems, $$$X$$$ is something like "all partitions of the sequence 1..n into contiguous subsequences", $$$f$$$ is what it is, $$$h$$$ is "number of parts of the partition", and $$$b$$$ is k.

The general situation where the Lagrange sufficiency theorem is useful is where $$$X$$$ is 'nice', so that we can optimise $$$L(-, \lambda)$$$ on $$$X$$$ (using a one-dimensional dp maybe), but the set $$$ \langle x \in X \mid h(x)=b \rangle $$$ is less nice, so that we cannot optimise on it directly (or maybe only in quadratic time). In general you will find that, letting $$$x(\lambda)$$$ maximise $$$L(-,\lambda)$$$, $$$h(x(\lambda))$$$ is monotonically decreasing in $$$\lambda$$$ (let's not worry about ambiguity in choice of $$$x(\lambda)$$$). So you should be able to find a $$$\lambda$$$ with $$$h(x(\lambda)) = b$$$ using binary search; binary search is fast so this method can be good. It might be that no $$$\lambda$$$ with $$$h(x(\lambda))=b$$$ exists, in which case this method isn't helpful. Convexity comes into play to ensure that $$$\lambda$$$ exists, and this is where things become a bit technical.

The Wikipedia article only seems to deal with the case where $$$X$$$ is a manifold (i.e. a 'continuous' space), and $$$f, h$$$ are differentiable. I think the reason for this is that instead of having to worry about global extremisation and existence of $$$\lambda$$$, you just have to look for (constrained) stationary points, which is simpler (it's local). But if you want a sufficient condition for when your solution is the constrained maximum, then you need the "sufficiency theorem". (Of course, derivatives and stationarity aren't going to help you in discrete dp problems...)

Not sure if this is what you were asking for, but I hope it helps.

Oh. It seems to be nice explanation. I think.

(I will hopefully be able to actually read and comprehend it at some moment of time)

My boy memento also wrote an article on this recently.

I like this article, is good.

I think there are two typos. The summation in the definition of the second dp should be over $$$v_k$$$ and it should probably be $$$max(x_t - y_{t + 1} + 1, 0)^2$$$ in the last dp.

You're right, fixed :)