"The trick from aliens"

Rating changes for last rounds are temporarily rolled back. They will be returned soon. ×

→ Pay attention

Before contest
Codeforces Round 941 (Div. 1)
4 days
Register now »

*has extra registration

Before contest
Codeforces Round 941 (Div. 2)
4 days
Register now »

*has extra registration

→ Streams

CodeChef Starters 131 Solution Discussion

By aryanc403

Before stream 18:10:16

View all →

→ Top rated

#	User	Rating
1	jiangly	3640
2	Benq	3593
3	tourist	3572
4	orzdevinwang	3561
5	cnnfls_csy	3539
6	ecnerwala	3534
7	Radewoosh	3532
8	gyh20	3447
9	Rebelz	3409
10	Geothermal	3408

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	163
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	151
8	SecondThread	147
9	orz	146
10	pajenegod	145

View all →

→ Find user

→ Recent actions

Detailed →

Anai's blog

"The trick from aliens"

By Anai, history, 5 years ago, In English

Here's the usual shameless self-advertisement!

Talking with my colleagues a few months ago, I noticed that except for the editorial of Aliens (IOI 2016), there isn't any comprehensive resource in learning the DP optimization (well, maybe except for this one, but... yeah), so I decided to write a more extensive tutorial on it. Cheers! ^^

#dp, #aliens

+158

Anai
5 years ago
30

Comments (30)

Write comment?

adamant

5 years ago, # |

+27

There is this article actually.

P.S. I wonder if there's going to be any formal work on the connection of this trick with Lagrange multipliers...

→ Reply

Anai

5 years ago, # ^ |

← Rev. 3 →

+17

Well... yes, I know the blog post, but I think there is room for quite a few more details. As for the Lagrange multipliers, now that you say it, that might just be the inspiration for the trick :))

→ Reply

_h_

5 years ago, # ^ |

+13

This trick relies on a simple fact sometimes known as the "Lagrange sufficiency theorem":

Let $$$X$$$ be a set, $$$f : X \to \mathbb R$$$ a 'value function', $$$h : X \to \mathbb R$$$ a 'constraint function', and $$$b \in \mathbb R$$$ our specific constraint. Suppose $$$\lambda \in \mathbb R$$$ and $$$x \in X$$$ are such that $$$x$$$ maximises $$$L(x,\lambda) = f(x) - \lambda(h(x)-b)$$$. Moreover, suppose $$$h(x)=b$$$. Then $$$x$$$ maximises $$$f(x)$$$ subject to the constraint $$$h(x)=b$$$.

(The proof is trivial: just observe that $$$L = f$$$ when $$$h(x)=b$$$.)

In the dp problems, $$$X$$$ is something like "all partitions of the sequence 1..n into contiguous subsequences", $$$f$$$ is what it is, $$$h$$$ is "number of parts of the partition", and $$$b$$$ is k.

The general situation where the Lagrange sufficiency theorem is useful is where $$$X$$$ is 'nice', so that we can optimise $$$L(-, \lambda)$$$ on $$$X$$$ (using a one-dimensional dp maybe), but the set $$$ \langle x \in X \mid h(x)=b \rangle $$$ is less nice, so that we cannot optimise on it directly (or maybe only in quadratic time). In general you will find that, letting $$$x(\lambda)$$$ maximise $$$L(-,\lambda)$$$, $$$h(x(\lambda))$$$ is monotonically decreasing in $$$\lambda$$$ (let's not worry about ambiguity in choice of $$$x(\lambda)$$$). So you should be able to find a $$$\lambda$$$ with $$$h(x(\lambda)) = b$$$ using binary search; binary search is fast so this method can be good. It might be that no $$$\lambda$$$ with $$$h(x(\lambda))=b$$$ exists, in which case this method isn't helpful. Convexity comes into play to ensure that $$$\lambda$$$ exists, and this is where things become a bit technical.

The Wikipedia article only seems to deal with the case where $$$X$$$ is a manifold (i.e. a 'continuous' space), and $$$f, h$$$ are differentiable. I think the reason for this is that instead of having to worry about global extremisation and existence of $$$\lambda$$$, you just have to look for (constrained) stationary points, which is simpler (it's local). But if you want a sufficient condition for when your solution is the constrained maximum, then you need the "sufficiency theorem". (Of course, derivatives and stationarity aren't going to help you in discrete dp problems...)

Not sure if this is what you were asking for, but I hope it helps.

→ Reply

adamant

5 years ago, # ^ |

Oh. It seems to be nice explanation. I think.

(I will hopefully be able to actually read and comprehend it at some moment of time)

→ Reply

Bruteforceman

5 years ago, # |

+49

My boy memento also wrote an article on this recently.

→ Reply

BrayanD

5 years ago, # ^ |

+10

I like this article, is good.

→ Reply

Hd7

4 years ago, # ^ |

Hi memento, I really want to know which tool or platform you used to build a github.io blog like that. I also wonder how you insert math formula and graph, the font of text, I see it so beautiful. Many thanks.

→ Reply

memento

3 years ago, # ^ |

For blog, HTML :((. For math formula, mathjax. For graphs, geogebra (I later found out that you can do that more easily and beautifully using python matplotlib).

→ Reply

Maksim1744

3 years ago, # ^ |

+89

I know I'm kinda late, but I decided to read your article and just couldn't skip this random fact about Antarctica

Due to the fact that $$$K$$$ is quite big, simulating $$$K−N$$$ iterations (maybe even with a priority queue) would take tremendous amount of time — roughly $$$10^9+7$$$ tons of ice will melt in Anterctica in that time.

I did my research and the latest data I could find is from 2009-2017 from here. It is said that over the specified period the total mass loss of Antarctic Ice Sheet was $$$\approx 252 \pm 26\, Gt/y$$$. I will take the average $$$252\cdot10^9$$$ per year.

Now, the code. It is pretty simple (it passes small tests on usaco, so it is probably right). Since you didn't specify the specs of a computer to run a program, let's use the popular and public ideone. So, here is my code.

Obviously, the code is too slow, so all I could do is estimate running time. And as you can see, it said that $$$\approx 1.138 \cdot 10^9$$$ tons of ice will melt while this program is executing. It means that you was wrong only by $$$\approx 14\%$$$, which is breathtaking!

Now I'm actually very curious: did you say $$$10^9 + 7$$$ randomly or at least googled some numbers?

→ Reply

memento

3 years ago, # ^ |

+13

The effects of climate changing always irritated me, so I wanted to put something related to that, whatever the figure might turn out to be. And so I googled. Believe it or not, it was breathtaking for me too!

Btw, thanks for checking out the article :D

→ Reply

Nson

5 years ago, # |

I think there are two typos. The summation in the definition of the second dp should be over $$$v_k$$$ and it should probably be $$$max(x_t - y_{t + 1} + 1, 0)^2$$$ in the last dp.

→ Reply

Anai

5 years ago, # ^ |

You're right, fixed :)

→ Reply

PoIar_

4 years ago, # |

← Rev. 2 →

+14

There's some part I don't understand. After the binary search the answer is supposed to be $$$ dp[n] - cnt[n] * \lambda $$$ if I want to use at most $$$K$$$ resources.

However in this problem 1279F - New Year and Handle Change the idea is to perform at most $$$k$$$ operations and I received wrong answer (90721141) with that way but received accepted (90488910) with $$$ dp[n] - k * \lambda $$$.

I know it's always better to perform k operations instead of some number smaller than k, but anyways the idea is that I was supposed to get the optimal value with any number of operations equal or less than k at the end right?

Is there something I am missing?

→ Reply

tfg

4 years ago, # ^ |

+21

dp[n] — cnt[n] * lamba might be tied for many different possible cnt[n] (in fact, if aliens trick works for the problem the set of good values of cnt[n] is a contiguous range of possible cnt[n], try to prove this), in this case you should use k as cnt[n] by yourself explicitely.

→ Reply

rivalq

4 years ago, # ^ |

How we can prove ,if dp is convex/concave ?. I mean is there any special property for cost function so that we can use binary search.

→ Reply

tfg

4 years ago, # ^ |

← Rev. 2 →

+18

There's a concrete property that is the sufficient and necessary condition for aliens to work:

For maximum cost: Let f(i) be the optimal answer taking i things/elements/operations. The increase you receive from i-1 to i must be at least as good as from i to i+1. In other words:

f(i) — f(i-1) >= f(i+1) — f(i)

f(i) >= (f(i+1) + f(i-1)) / 2

for minimum, >= turns into <=

Now, as for how to prove, every problem is a different problem. You really need to approach every problem differently probably. But that second inequality is useful because a certain way to prove it is assume optimal answers for i-1 and i+1, then somehow exchange things/elements/operations in a way that i-1 and i+1 both end up with valid i things/elements/operations. If you can do that then you've found a valid set of i things/elements/operations that's at least as good as the mean of i-1 and i+1. To see how this works better look for that recent educational F that was aliens trick, someone explained in the editorial's comment section a proof for that problem.

Spoiler

→ Reply

rivalq

4 years ago, # ^ |

I was solving a problem in which we have to split array into k segments , where cost of an subarray is square of sum of the elements of that subarray. I assume formation of an segment as an cut, initially there is not cut (k=1), now i observe that contibution of an individual cut decreases if we increases k(that is addition of more cuts). Now lets say answer for i-1 cuts is x. for i cuts , select any i-1 cuts there contibution to answer should be greater than or equal to x , for remaining one lets say it decreases the answer by delta1. ans(i-1)=x ans(i) =x-delta1. if i apply same thing to i+1 cuts, ans(i+1) = x-delta2-delt3. now to prove if our function is concave/convex we just have to proof 2*delta1>=delta2+delta3, and i think this is very much intuitive. please correct me if i am wrong.

→ Reply

tfg

4 years ago, # ^ |

+13

It's something like that but you're oversimplifying it imo because it's not guaranteed that the other i-1 cuts will remain in the same position.

→ Reply

rivalq

4 years ago, # ^ |

yeah i was also thinking the same.

→ Reply

_h_

4 years ago, # ^ |

+23

I think there is a general theme: you get concavity from bounded integer linear programs whose relaxations do not introduce new basic solutions -- I'll explain what this means. This category of problems includes flow, and things which reduce to flow, and also partitions into contiguous subsequences (maybe this also reduces to flow, but I don't need to know this to prove concavity!).

Consider a general integer linear program, i.e. something of this form: find a vector of $$$n$$$ integers $$$x = (x_1, \ldots, x_n)$$$ which maximizes some linear functional $$$f(x) = c_1x_1 + \ldots +c_nx_n$$$ subject to some linear cost constraint $$$Ax \le b$$$ (note that $$$Ax$$$ and $$$b$$$ are vectors -- this is a list of inequalities). Say $$$g(b)$$$ is the maximum $$$f$$$, as a function of $$$b$$$.

Now consider the same problem, except we drop constraint that $$$x_i$$$ be integral. This is the "relaxation". Say $$$g_r(b)$$$ is the answer in this case. I claim that $$$g_r$$$ is obviously concave. Indeed, given $$$b_1, b_2$$$, and $$$0 \le t \le 1$$$, take $$$x^1, x^2$$$ such that $$$Ax^j \le b_j$$$, $$$f(x^j) = g(b_j)$$$, and let $$$x = tx^1 + (1-t) x^2$$$. Then $$$Ax \le tb_1 + (1-t)b_2$$$ and so $$$f(x) = t f(x^1) + (1-t) f(x^2) = t g(b_1) + (1-t) g(b_2) \le g( t b_1 + (1-t) b_2)$$$, as needed.

This doesn't mean that $$$g$$$ is concave, as in general we could have $$$g < g_r$$$. But sometimes $$$g(b) = g_r(b)$$$. In particular this happens when any $$$x$$$ such that $$$Ax \le b$$$ is an affine combination of some integer solutions $$$x^1, \ldots, x^m$$$, $$$Ax^j \le b$$$, because in this case $$$f(x) \le \max_j f(x^j) \le g(r)$$$.

For any bounded linear program (bounded means that $$$x$$$ such that $$$Ax = b$$$ cannot be arbitrarily large), all solutions are affine combinations of basic solutions (which are vertices of the polytope). You get basic solutions by deciding that some of the inequalities in $$$Ax$$$ should be equalities, and solving the corresponding system of equations.

So here's the general recipe to prove convexity: phrase your problem as an integer linear program, check that it is bounded, and check that all basic solutions are integral.

Let me describe how I believe this works for your problem of partitioning a sequence of length $$$n$$$ into at most $$$k$$$ parts, such that sum-of-squares-of-sums is minimized. (I guess the array should be nonnegative, since you don't get convexity for $$$1, -1, 1, -1$$$.) Say we have $$$n + {n \choose 2}$$$ unknowns $$$x_1, \ldots, x_n$$$, $$$d_{ij}$$$ ($$$1 \le i < j \le n$$$), where $$$x_i$$$ represents "which part is the $$$i$$$th term in", and $$$d_{ij}$$$ "should" be 1 if $$$i$$$, $$$j$$$ are in the same part, and $$$0$$$ otherwise. So our constraints are $$$1 \le x_1 \le x_2 \le \ldots \le x_n \le k$$$, $$$0 \le d_{ij} \le 1$$$, $$$1 - (x_j - x_i) \le d_{ij}$$$. Note that sum-of-squares-of-sums is a linear function in $$$d_{ij}$$$. (This formulation lets us take $$$d_{ij} = 1$$$ even if $$$x_i < x_j$$$, but that's never optimal, since we assume all elements of array have same sign.) So the integer version of this problem gives the answer you want. You can check that (for integer $$$k$$$) all basic solutions are integral. So the minimum sum-of-squares-of-sums is convex as a function of $$$k$$$, by the general theory above.

→ Reply

rivalq

4 years ago, # ^ |

+10

Wow that is awesome, I think now i understands this topic little much better.

→ Reply

Nots0fast

2 years ago, # ^ |

How do you prove that all basic solutions are integral ?

→ Reply

sys.

4 years ago, # |

In China, it is named " WQS Binary Search ". Because this trick was invented by Wang QinShi (王钦石, WQS). Here's the original paper.

→ Reply

AnandOza

3 years ago, # |

+39

Anai, your website is down. Do you have a copy of the article? I remember enjoying it!

→ Reply

Anai

3 years ago, # ^ |

I am really sorry for this, I'm having some issues at univ and I haven't paid the server because I really wanna code the website again a better format and post some other mathsy things there. I'm 99% sure it will be done by the end of December.

→ Reply

AnandOza

3 years ago, # ^ |

No problem, thanks for writing the article to begin with!

If it's easy to upload a PDF version or post it as a CF blog, that would be great, but no worries if you can't.

→ Reply

Anai

3 years ago, # |

You can find a copy of the website here, the old URL will take a while to point to this IP.

→ Reply

ScarletS

2 years ago, # ^ |

It doesn't seem to be up anymore :(

→ Reply

Anai

19 months ago, # ^ |

← Rev. 2 →

It is now (mostly). Use the https://serbanology.herokuapp.com/ for the moment, just got it back up and it will take a while for the domain to propagate.

→ Reply

ScarletS

19 months ago, # ^ |

Thank you, I hope it was worth waiting 6 months :)

→ Reply