errorgorn's blog

By errorgorn, history, 23 months ago, In English

Example problem: (I dont have the link as this problem only exists in my school's private judge)

We have an array of n positive intergers and we need to group them into k segments such that each element of the array is in a segment and that all elements in a segment are contigous. The cost of a segment is the sum of the numbers in the segment squared.

We need to minimize the total cost of all the segments.

Example 1, we have n=5 and k=3. Where the array is [1,2,3,4,5]. The optimal answer is by grouping (1+2+3) , (4) and (5) into different segments. The total cost is 6^2 + 4^2 + 5^2 =77.

Example 2, we have n=4, k=2. Where the array is [1,1000000,3,4]. The optimal answer is by grouping (1,1000000) and (3,4) into different segments. The total cost is 1000001^2+7^2=10000200050

Now I asked some people and they said this can be solved by doing the lagrange (or aliens) speedup trick. We shall define a cost C. Then do a convex hull trick where we try to minimize the value of all our segments but we also add C to our cost whenever we make a new segment. Now, if C is INF, then we will only have 1 segment, while if C is 0 we will have n segments. So people claim that we can binary search on C to find some C where there will be K segments existing.

This allowed me to AC the problem, but my solution was not always correct as the test cases were weak. I thought of this test case: n=4, k=3. Where is array is [1,1,1,1] Now, when C is 2, we get 4 segments. But when C is 3, we get 2 segments only. Therefore when I ran my code against this case my code printed 8, even though the answer was (1+1)^2+1^2+1^2=6.

So I think I need someway to iterpolate the lagrange speedup trick. Anyone can help? My code is here.

For my code, the input format is:

n k

a1 a2 a3 a3... an

Where ai is the ith element in the array.

 
 
 
 
  • Vote: I like it
  • +31
  • Vote: I do not like it

»
23 months ago, # |
  Vote: I like it +8 Vote: I do not like it

Auto comment: topic has been updated by errorgorn (previous revision, new revision, compare).

»
23 months ago, # |
  Vote: I like it +27 Vote: I do not like it

The way I like to think about this trick is as follows:

We first prove (or guess) that the function $$$f(k)$$$ mapping $$$k$$$ into the optimum cost using $$$k$$$ segment is convex. (In your case: $$$f(1)=16$$$, $$$f(2)=8$$$, $$$f(3)=6$$$, $$$f(4)=4$$$).

Imagine the graph of $$$f$$$. Then, your convex hull trick for a defined cost $$$C$$$ is equivalent to finding the tangent line to this graph with slope equal to $$$-C$$$. For example, for $$$C=\infty$$$, the line is vertical and it's obviously tangent to the graph at $$$(1, 16)$$$, and for $$$C=0$$$, the line is horizontal and is tangent to the graph at $$$(4, 4)$$$.

The problem arises when the function isn't strictly convex — in your case, the function is linear for $$$k \in [2, 4]$$$. For each $$$C$$$, your convex hull will return some $$$k$$$ for which the line with slope $$$-C$$$ is tangent at point $$$(k, f(k))$$$. Let's assume that the graph of $$$f$$$ looks like this:

...and we want to find $$$f(5)$$$. The function is linear on $$$[1,4]$$$ with slope $$$-3$$$, on $$$[4,7]$$$ with slope $$$-2$$$, and on $$$[7,10]$$$ with slope $$$-1$$$.

Let's say that your convex hull trick produces the largest $$$k$$$ such that $$$(k, f(k))$$$ is tangent to the graph. (So, in your CHT, it's equivalent to trying to maximize the number of intervals used.) Then, for $$$C=3$$$, it will produce $$$k=4$$$, and for $$$C=2$$$, it will produce $$$k=7$$$. We also know that $$$f(4)=11$$$ and $$$f(7)=5$$$. We deduce that the function is linear on $$$[4, 7]$$$, and therefore $$$f(5)$$$ can be easily computed from $$$f(4)$$$ and $$$f(7)$$$ using some linear interpolation.

I think it's the easiest way to deal with this problem. Perhaps someone knows a better way?

  • »
    »
    23 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Oh i see.

    Thanks man :)

  • »
    »
    23 months ago, # ^ |
      Vote: I like it +25 Vote: I do not like it

    Ok, that is easy, but is there some concise general way of restoring the optimal partition in such cases? I can think of keeping some intervals where answers doesn't change, but that is awful.

  • »
    »
    23 months ago, # ^ |
      Vote: I like it +15 Vote: I do not like it

    OK, what would you say for restoring answer like this: instead of binary searching on C we will binary search on (C, i), where second parameter i means that on first i positions we prefer breaking ties by taking more subsegments, on later ones by taking less subsegments (or the other way around, not sure). Would this always give us final answer with exactly k subsegments even if k is in the middle of some plateau? Tbh not sure, sounds risky, but maybe it is worth giving a thought :P

»
23 months ago, # |
  Vote: I like it +15 Vote: I do not like it

I'm pretty damn sure this was on Hackerrank too. I solved it by some improved DP because the tests were weak :^)

»
23 months ago, # |
Rev. 2   Vote: I like it +15 Vote: I do not like it

Hi! As far as I know, binary searching for the penalty when using aliens trick should be done on doubles. I don't know the exact proof but using doubles in your code seems to do the trick. Maybe someone can explain why? https://pastebin.com/KiEdptZf UPD: https://pastebin.com/83CL3CaZ

  • »
    »
    23 months ago, # ^ |
    Rev. 5   Vote: I like it +10 Vote: I do not like it

    Code used to have mistakes but noticed that some variable was not set to double on line 56. code

    I realized this is the same idea mnvbmar metioned earlier. since if you print the value of (a+b)*0.5 after binary searching you effectively get the largest c that can splits the array into m segments where m>=k. So we just use some linear interpolation and the answer comes out.