Here's the usual shameless self-advertisement!

Talking with my colleagues a few months ago, I noticed that except for the editorial of Aliens (IOI 2016), there isn't any comprehensive resource in learning the DP optimization (well, maybe except for this one, but... yeah), so I decided to write a more extensive tutorial on it. Cheers! ^^

There is this article actually.

P.S. I wonder if there's going to be any formal work on the connection of this trick with Lagrange multipliers...

Well... yes, I know the blog post, but I think there is room for quite a few more details. As for the Lagrange multipliers, now that you say it, that might just be the inspiration for the trick :))

This trick relies on a simple fact sometimes known as the "Lagrange sufficiency theorem":

Let $$$X$$$ be a set, $$$f : X \to \mathbb R$$$ a 'value function', $$$h : X \to \mathbb R$$$ a 'constraint function', and $$$b \in \mathbb R$$$ our specific constraint. Suppose $$$\lambda \in \mathbb R$$$ and $$$x \in X$$$ are such that $$$x$$$ maximises $$$L(x,\lambda) = f(x) - \lambda(h(x)-b)$$$. Moreover, suppose $$$h(x)=b$$$. Then $$$x$$$ maximises $$$f(x)$$$ subject to the constraint $$$h(x)=b$$$.

(The proof is trivial: just observe that $$$L = f$$$ when $$$h(x)=b$$$.)

In the dp problems, $$$X$$$ is something like "all partitions of the sequence 1..n into contiguous subsequences", $$$f$$$ is what it is, $$$h$$$ is "number of parts of the partition", and $$$b$$$ is k.

The general situation where the Lagrange sufficiency theorem is useful is where $$$X$$$ is 'nice', so that we can optimise $$$L(-, \lambda)$$$ on $$$X$$$ (using a one-dimensional dp maybe), but the set $$$ \langle x \in X \mid h(x)=b \rangle $$$ is less nice, so that we cannot optimise on it directly (or maybe only in quadratic time). In general you will find that, letting $$$x(\lambda)$$$ maximise $$$L(-,\lambda)$$$, $$$h(x(\lambda))$$$ is monotonically decreasing in $$$\lambda$$$ (let's not worry about ambiguity in choice of $$$x(\lambda)$$$). So you should be able to find a $$$\lambda$$$ with $$$h(x(\lambda)) = b$$$ using binary search; binary search is fast so this method can be good. It might be that no $$$\lambda$$$ with $$$h(x(\lambda))=b$$$ exists, in which case this method isn't helpful. Convexity comes into play to ensure that $$$\lambda$$$ exists, and this is where things become a bit technical.

The Wikipedia article only seems to deal with the case where $$$X$$$ is a manifold (i.e. a 'continuous' space), and $$$f, h$$$ are differentiable. I think the reason for this is that instead of having to worry about global extremisation and existence of $$$\lambda$$$, you just have to look for (constrained) stationary points, which is simpler (it's local). But if you want a sufficient condition for when your solution is the constrained maximum, then you need the "sufficiency theorem". (Of course, derivatives and stationarity aren't going to help you in discrete dp problems...)

Not sure if this is what you were asking for, but I hope it helps.

Oh. It seems to be nice explanation. I think.

(I will hopefully be able to actually read and comprehend it at some moment of time)

My boy memento also wrote an article on this recently.

I like this article, is good.

Hi memento, I really want to know which tool or platform you used to build a github.io blog like that. I also wonder how you insert math formula and graph, the font of text, I see it so beautiful. Many thanks.

I think there are two typos. The summation in the definition of the second dp should be over $$$v_k$$$ and it should probably be $$$max(x_t - y_{t + 1} + 1, 0)^2$$$ in the last dp.

You're right, fixed :)

There's some part I don't understand. After the binary search the answer is supposed to be $$$ dp[n] - cnt[n] * \lambda $$$ if I want to use at most $$$K$$$ resources.

However in this problem 1279F - Новый год и смена хендла the idea is to perform at most $$$k$$$ operations and I received wrong answer (90721141) with that way but received accepted (90488910) with $$$ dp[n] - k * \lambda $$$.

I know it's always better to perform k operations instead of some number smaller than k, but anyways the idea is that I was supposed to get the optimal value with any number of operations equal or less than k at the end right?

Is there something I am missing?

dp[n] — cnt[n] * lamba might be tied for many different possible cnt[n] (in fact, if aliens trick works for the problem the set of good values of cnt[n] is a contiguous range of possible cnt[n], try to prove this), in this case you should use k as cnt[n] by yourself explicitely.

How we can prove ,if dp is convex/concave ?. I mean is there any special property for cost function so that we can use binary search.

There's a concrete property that is the sufficient and necessary condition for aliens to work:

For maximum cost: Let f(i) be the optimal answer taking i things/elements/operations. The increase you receive from i-1 to i must be at least as good as from i to i+1. In other words:

f(i) — f(i-1) >= f(i+1) — f(i)

or

f(i) >= (f(i+1) + f(i-1)) / 2

for minimum, >= turns into <=

Now, as for how to prove, every problem is a different problem. You really need to approach every problem differently probably. But that second inequality is useful because a certain way to prove it is assume optimal answers for i-1 and i+1, then somehow exchange things/elements/operations in a way that i-1 and i+1 both end up with valid i things/elements/operations. If you can do that then you've found a valid set of i things/elements/operations that's at least as good as the mean of i-1 and i+1. To see how this works better look for that recent educational F that was aliens trick, someone explained in the editorial's comment section a proof for that problem.

SpoilerIf you asked about how this implies we can use binary search, you can view it as points (i, f(i)). Every point in there should be in the convex hull. Now, solving the dp with a certain lambda is just maximum dot product queries with points (-lambda, 1).

I was solving a problem in which we have to split array into k segments , where cost of an subarray is square of sum of the elements of that subarray. I assume formation of an segment as an cut, initially there is not cut (k=1), now i observe that contibution of an individual cut decreases if we increases k(that is addition of more cuts). Now lets say answer for i-1 cuts is x. for i cuts , select any i-1 cuts there contibution to answer should be greater than or equal to x , for remaining one lets say it decreases the answer by delta1. ans(i-1)=x ans(i) =x-delta1. if i apply same thing to i+1 cuts, ans(i+1) = x-delta2-delt3. now to prove if our function is concave/convex we just have to proof 2*delta1>=delta2+delta3, and i think this is very much intuitive. please correct me if i am wrong.

It's something like that but you're oversimplifying it imo because it's not guaranteed that the other i-1 cuts will remain in the same position.

yeah i was also thinking the same.

I think there is a general theme: you get concavity from bounded integer linear programs whose relaxations do not introduce new basic solutions -- I'll explain what this means. This category of problems includes flow, and things which reduce to flow, and also partitions into contiguous subsequences (maybe this also reduces to flow, but I don't need to know this to prove concavity!).

Consider a general integer linear program, i.e. something of this form: find a vector of $$$n$$$ integers $$$x = (x_1, \ldots, x_n)$$$ which maximizes some linear functional $$$f(x) = c_1x_1 + \ldots +c_nx_n$$$ subject to some linear cost constraint $$$Ax \le b$$$ (note that $$$Ax$$$ and $$$b$$$ are vectors -- this is a list of inequalities). Say $$$g(b)$$$ is the maximum $$$f$$$, as a function of $$$b$$$.

Now consider the same problem, except we drop constraint that $$$x_i$$$ be integral. This is the "relaxation". Say $$$g_r(b)$$$ is the answer in this case. I claim that $$$g_r$$$ is

obviouslyconcave. Indeed, given $$$b_1, b_2$$$, and $$$0 \le t \le 1$$$, take $$$x^1, x^2$$$ such that $$$Ax^j \le b_j$$$, $$$f(x^j) = g(b_j)$$$, and let $$$x = tx^1 + (1-t) x^2$$$. Then $$$Ax \le tb_1 + (1-t)b_2$$$ and so $$$f(x) = t f(x^1) + (1-t) f(x^2) = t g(b_1) + (1-t) g(b_2) \le g( t b_1 + (1-t) b_2)$$$, as needed.This doesn't mean that $$$g$$$ is concave, as in general we could have $$$g < g_r$$$. But sometimes $$$g(b) = g_r(b)$$$. In particular this happens when any $$$x$$$ such that $$$Ax \le b$$$ is an

affine combinationof some integer solutions $$$x^1, \ldots, x^m$$$, $$$Ax^j \le b$$$, because in this case $$$f(x) \le \max_j f(x^j) \le g(r)$$$.For any bounded linear program (bounded means that $$$x$$$ such that $$$Ax = b$$$ cannot be arbitrarily large), all solutions are affine combinations of

basicsolutions (which are vertices of the polytope). You get basic solutions by deciding that some of the inequalities in $$$Ax$$$ should be equalities, and solving the corresponding system of equations.So here's the general recipe to prove convexity: phrase your problem as an integer linear program, check that it is bounded, and check that all basic solutions are integral.

Let me describe how I believe this works for your problem of partitioning a sequence of length $$$n$$$ into at most $$$k$$$ parts, such that sum-of-squares-of-sums is minimized. (I guess the array should be nonnegative, since you don't get convexity for $$$1, -1, 1, -1$$$.) Say we have $$$n + {n \choose 2}$$$ unknowns $$$x_1, \ldots, x_n$$$, $$$d_{ij}$$$ ($$$1 \le i < j \le n$$$), where $$$x_i$$$ represents "which part is the $$$i$$$th term in", and $$$d_{ij}$$$ "should" be 1 if $$$i$$$, $$$j$$$ are in the same part, and $$$0$$$ otherwise. So our constraints are $$$1 \le x_1 \le x_2 \le \ldots \le x_n \le k$$$, $$$0 \le d_{ij} \le 1$$$, $$$1 - (x_j - x_i) \le d_{ij}$$$. Note that sum-of-squares-of-sums is a linear function in $$$d_{ij}$$$. (This formulation lets us take $$$d_{ij} = 1$$$ even if $$$x_i < x_j$$$, but that's never optimal, since we assume all elements of array have same sign.) So the integer version of this problem gives the answer you want. You can check that (for integer $$$k$$$) all basic solutions are integral. So the minimum sum-of-squares-of-sums is convex as a function of $$$k$$$, by the general theory above.

Wow that is awesome, I think now i understands this topic little much better.

In China, it is named "

WQSBinary Search ". Because this trick was invented by Wang QinShi (王钦石, WQS). Here's the original paper.