The problem D in the latest Open Cup involves function *f*(*n*) which is defined as the minimum sum of sequence *b*_{1}, ..., *b*_{k} such that any sequence *a*_{1}, ..., *a*_{l} with sum less than or equal to *n* can be dominated by some subsequence of *b*. One sequence dominates the other, if they have the same length any every element of the first sequence greater than or equal to corresponding element of the second sequence. It turns out that *f*(*n*) = *n* + *f*(*k*) + *f*(*n* − 1 − *k*) where *k* = [(*n* − 1) / 2]. But why? If this equation still keeps you up at night, you can finally sleep well now. I have found a wonderful proof of this statement which fits the bounds of this site.

Two sides of the proof

Let us construct the *b* sequence first.

For brevity, we will say that sequence *b* covers sequence *a*, if some subsequence of *b* dominates *a*.

**Theorem 1.** Let *B*_{1} be a sequence that covers all sequences with sum less than or equal to *k*. Let *B*_{2} be a sequence that covers all sequences with sum less than or equal to *l*. Let *n* = *k* + *l* + 1. Then sequence *B*_{1}, *n*, *B*_{2} covers all sequences with sum less than or equal to *n*.

*Proof*. Let *a*_{1}, ..., *a*_{s} be a sequence with sum less than or equal to *n*. Let *t* be the largest index such that *a*_{1} + ... + *a*_{t} ≤ *k*. Then the first *t* elements of sequence *a* can be dominated by some subsequence of *B*_{1}. If *t* = *s* we don't need to go any further. Otherwise, the element *a*_{t + 1} can be dominated by *n* and the rest of the sequence can be dominated by a subsequence of *B*_{2} since its sum is less than of equal to *n* − (*a*_{1} + ... + *a*_{t + 1}) ≤ *n* − *k* − 1 = *l* where we used that the sum of first *t* + 1 elements of *a* is strictly greater than *k* by definition of *t*.

From Theorem 1 follows that *f*(*n*) ≤ min_{k}(*n* + *f*(*k*) + *f*(*n* − 1 − *k*)).

Simple enough, but how to prove the lower bound?

First of all, any covering sequence must cover sequence with single element *a*_{1} = *n* and so must have element greater than or equal to *n*.

**Theorem 2.** Let *B* = *B*_{1}, *m*, *B*_{2} be an arbitrary split of a sequence that covers all sequences with sum less than or equal to *n*. Let *k* be the largest number such that all sequences with sum less than or equal to *k* are covered by *B*_{1}. Let *l* be the largest number such that all sequences with sum less than of equal to *l* are covered by *B*_{2}. Then *k* + *l* ≥ *n* − 1.

*Proof*. Assume *k* + *l* ≤ *n* − 2. By definition of *k* there is sequence *A*_{1} with sum *k* + 1 which is not covered by *B*_{1}. By definition of *l* there is sequence *A*_{2} with sum *l* + 1 which is not covered by *B*_{2}. Let us consider sequence *A*_{1}, *A*_{2}. It has sum *k* + 1 + *l* + 1 ≤ *n*. Since *A*_{1} is not covered by *B*_{1}, the latest element of *A*_{1} must be covered by *m* or element to the right of it. By the same logic, the first element of *A*_{2} must be covered by *m* or element to the left of it. But the latest element of *A*_{1} must be covered by element of *B* to the left of element covering the first element of *A*_{2}. We arrive at the contradiction and so *k* + *l* ≥ *n* − 1.

If we split the sequence *B* at element greater than or equal to *n*, from Theorem 2 follows *f*(*n*) ≥ *n* + *f*(*k*) + *f*(*l*) where *k* + *l* ≥ *n* − 1. Since *f*(*n*) is non-decreasing we can set *l* = *n* − 1 − *k* and get *f*(*n*) ≥ min_{k}(*n* + *f*(*k*) + *f*(*n* − 1 − *k*)).

Combining results from both theorems, we have *f*(*n*) = min_{k}(*n* + *f*(*k*) + *f*(*n* − 1 − *k*)). Q.E.D.

# Piece of convex cake

*But, sir, you have got the wrong formula.* The only way they can be the same, if minimum over *k* is always happens to be in the middle. Like in the case of convex functions. Well, let us prove *f*(*n*) is convex then. Stay back, I am going to use induction.

**Theorem 3.**

- For all
*n*≥ 1.*f*(*n*) =*n*+*f*(*k*) +*f*(*n*− 1 −*k*) where*k*= [(*n*− 1) / 2]. - For all
*n*≥ 2.*f*(*n*) −*f*(*n*− 1) ≥*f*(*n*− 1) −*f*(*n*− 2).

*Proof*. We have *f*(0) = 0, *f*(1) = 1, *f*(2) = 3 which prove the theorem for *n* ≤ 2. We have the basis of induction, now let us prove the induction step. Assume the theorem is proved for all *n* ≤ *m*. Let us prove it for *n* = *m* + 1.

The first statement is the consequence of the second for *n* ≤ *m*. If *k* + 1 ≤ *m* − (*k* + 1), we have *f*(*k*) + *f*(*m* − *k*) = *f*(*k*) + *f*(*m* − (*k* + 1)) + (*f*(*m* − *k*) − *f*(*m* − (*k* + 1))) ≥ *f*(*k*) + *f*(*m* − (*k* + 1)) + (*f*(*k* + 1) − *f*(*k*)) = *f*(*k* + 1) + *f*(*m* − (*k* + 1)). So if we increase *k* starting from 1, the sum *f*(*k*) + *f*(*m* − *k*) decreases or stays the same until *k* = [*m*/2].

In order to prove the second statement, we need to consider two cases.

First case, *m* = 2*t*, *t* ≥ 1. The first difference is equal to *f*(*m* + 1) − *f*(*m*) = (*m* + 1 + 2*f*(*t*)) − (*m* + *f*(*t*) + *f*(*t* − 1)) = 1 + *f*(*t*) − *f*(*t* − 1). The second difference is equal to *f*(*m*) − *f*(*m* − 1) = (*m* + *f*(*t*) + *f*(*t* − 1)) − (*m* − 1 + 2*f*(*t* − 1)) = 1 + *f*(*t*) − *f*(*t* − 1). The differences coincide which proves the second statement.

Second case. *m* = 2*t* + 1, *t* ≥ 1. The first difference is equal to *f*(*m* + 1) − *f*(*m*) = (*m* + 1 + *f*(*t* + 1) + *f*(*t*)) − (*m* + 2*f*(*t*)) = 1 + *f*(*t* + 1) − *f*(*t*). The second difference is equal to *f*(*m*) − *f*(*m* − 1) = (*m* + 2*f*(*t*)) − (*m* − 1 + *f*(*t*) + *f*(*t* − 1)) = 1 + *f*(*t*) − *f*(*t* − 1). The first difference is greater than or equal to the second due to induction assumption for *t* + 1 ≤ *m* which we can apply since *t* + 1 ≥ 2.

This completes the proof of the theorem and allows us to remove min from the formula.

Nice! Good to know it exists :D. I actually even read it ;p. By the way I am even more interested in proof of B that was harder to believe.

I still don't have a proof, this is some random thoughts.

Consider a standard s-t maxflow problem. Write it as a LP problem and take its dual. It turns out that the dual is the following:

Assign labels to the vertices.

label(s) = 1,label(t) = 0, and for other vertices the labels are some real values. The cost of this labeling is defined as the sum of (label(s_{e}) -label(t_{e})) *capacity_{e}over all edgese(heres_{e},t_{e}are the endpoints ofe). What is the minimum possible cost of the labeling?Then, we can prove that the cost becomes the smallest when the labels are 0/1 valued, and it leads to the proof of maxflow-mincut theorem.

Now let's return to the original problem. The dual is quite similar:

Assign two labels to the vertices, for vertex

vdefineW(v) andO(v).W(W_{s}) =O(O_{s}) = 1,W(W_{t}) =O(O_{t}) = 0 must be satisfied. The cost is the sum ofmax(|O(e_{1}) -O(e_{2})|, |W(e_{1}) -W(e_{2})|) *capacity_{e}over all edgese(heree_{1},e_{2}are the endpoints ofe).Now, here's the unproved part: believe that the cost becomes the smallest when the labels are 0/1 valued. If we assume this, it can be restated as:

Choose a subset of edges. This subset is called a

cutif it cutsW_{s}andW_{t}, and at the same time it cutsO_{s}andO_{t}. What is the cut of minimum cost?Then it's easy to see that the solution with two flows is correct.

as stated here: http://codeforces.com/blog/entry/50685?#comment-347181 a probable proof should be found in that paper. But the main theorem (related to this problem) is taken from here Multi-Commodity Network Flows where the proof is an algorithmic one :(

So did you completed the 'more mathematical' proof state above?

One friend of mine find in OEIS (I think) that this sequence coincide with worst case comparison needed to sort an array of n integers (using an algorithm based on comparisons, of course ;)

Is it any way to extend your ideas to prove this?

I don't think so, as the problem of optimal comparison sorting is yet unsolved even for small

n. The smallestnfor which the answer is unknown is 16.mareksom is writing master thesis whose main goal is to determine whether optimal number of comparisons for 16 elements is 45 or 46 :D. Under guidance of Marcin Peczarski that is mentioned many times in references :P.

What a coincidence: I was thinking about solving the very same problem in

mymaster thesis :) Though I haven't yet selected a topic. And Marcin's paper was in fact the one which inspired me and instigated to think over some refinements for their approach.Does Marek already have an answer?

No, I still don't know the answer. And probably it will take me at least a year to find it. I am concentrated on parallelization of existing algorithms to make them work efficiently on a supercomputer. It's really hard to come up with real improvements to the core algorithm, because Marcin Peczarski has already done a great job optimizing it.

It's cool that there is some other student interested in this topic : )

I didn't know that, I checked the sequence in OEIS and you were right. It states for maximal number of comparisons for sorting n elements by

binary insertion.By the way, it reminded me more peculiar coincidence.

When we went to a programming camp last week we found a shop that sells spicy noodles. Nine different spiciness were available there: 1, 3, 5, 8, 11, 15, 20, 25, 30.

OEIS 49706