Maximum (sum × length) subarray problem

→ Pay attention

Before contest
Codeforces Round 940 (Div. 2) and CodeCraft-23
35:21:49
Register now »

*has extra registration

→ Top rated

#	User	Rating
1	ecnerwala	3648
2	Benq	3580
3	orzdevinwang	3570
4	cnnfls_csy	3569
5	Geothermal	3568
6	tourist	3565
7	maroonrk	3530
8	Radewoosh	3520
9	Um_nik	3481
10	jiangly	3467

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	maomao90	174
2	awoo	164
2	adamant	164
4	TheScrasse	159
4	nor	159
6	maroonrk	156
7	-is-this-fft-	150
8	SecondThread	147
9	orz	146
10	pajenegod	145

View all →

→ Find user

→ Recent actions

Detailed →

obag's blog

Maximum (sum × length) subarray problem

By obag, 10 years ago, In English

Hi there, Codeforces community! This question has been bugging me for a while now, so I thought I would share it with you, and see if we can work out the answer together.

The problem is the following: given an array of n integer numbers (positive and negative), find a (contiguous) subarray for which the product $\text{[math]}$ is maximum. I would like to find an algorithm with complexity less than O(n²).

For instance, if the array is

$\text{[math]}$

then the underlined subarray would be the one we are looking for (its "score" is 7×4 = 28) .

Notice that if we changed "sum" into "min" the problem would be solvable in $\text{[math]}$ time. Similarly, if we changed "sum" into "gcd" the problem would still be solvable in $\text{[math]}$ time. (I can elaborate if you are interested). The problem is, with other associative operations, such as sum or product, I don't know how to go down from n² to something less.

Any clue?

array, problem

obag
10 years ago
29

Comments (28)

Show archived | Write comment?

AlexanderBolshakov

10 years ago, # |

← Rev. 4 →

+12

Notice that if we changed "sum" into "max" the problem would be solvable in $\text{[math]}$ time.

We'd just need to take either the whole array, or nothing. So, this is O(n).

→ Reply

obag

10 years ago, # ^ |

Ahah, you're right! I meant "min" (I edited the post, thanks)

→ Reply

kingofnumbers

10 years ago, # ^ |

← Rev. 2 →

LOL , it is still solvable in O(n) time.

for every i find nearest integer smaller than i'th integer from left and from rightز

more formally:

let L[i] equal to largest j such that j<i and A[j]<A[i]

and R[i] equal to smallest k such that i<k and A[i]>A[k]

then the answer is max( ( (R[i]-1) — (L[i]+1) + 1) * A[i] ) for all i

finding the values of L and R can be done using stack in O(n)

this is similar problem HISTOGRA

→ Reply

obag

10 years ago, # ^ |

Yes, this was also my solution indeed. Only difference I was computing those L[i] and R[i] values using a binary search for each i.

But then you're right, it is possible to compute L and R inductively in O(n) time.

The point is: can we still solve the problem efficiently if we change that "min" into a "sum"?

→ Reply

niklasb

10 years ago, # ^ |

Not if negative numbers are allowed.

→ Reply

JuanMata

10 years ago, # ^ |

← Rev. 2 →

if there exists atleast one non-negative integer, then choose the entire array as the subarray, and the answer will be maximized.
if not, then choose the least negative integer (single element) as the subarray, and the magnitude of the answer (which will be negative) will be minimized.
so $\text{[math]}$ it is!

→ Reply

niklasb

10 years ago, # ^ |

Hm not sure what I was thinking :D I guess I was thinking about sums. Of course you're right.

→ Reply

DanAlex

10 years ago, # |

← Rev. 3 →

For the example the answer should be 66.

→ Reply

obag

10 years ago, # ^ |

Yes, you're absolutely right. Sorry for that, I fixed the example :)

→ Reply

shuprog1

8 years ago, # ^ |

Could you please elaborate this?

"Similarly, if we changed "sum" into "gcd" the problem would still be solvable in time. (I can elaborate if you are interested)."

Thanks

→ Reply

adamant

8 years ago, # ^ |

← Rev. 2 →

+58

There can be only $\text{[math]}$ different gcd values among suffixes of an array (due to Euclide's algorithm complexity). So you can add characters one by one keeping all possible gcd's of current array's suffixes.

By the way, does anyone know solution for the original problem now, 2 years later?

→ Reply

Batman

8 years ago, # ^ |

How to implement adding in O(nlogn) ?

→ Reply

adamant

8 years ago, # ^ |

For each gcd keep the min and max length of suffixes which has such gcd. After that when you want to add element, for each segment compute gcd with this new element and update the borders. Also you should add one new segment which contains only that element. You should glue segments with same gcd when you do so.

→ Reply

s3ct4l-r3x

8 years ago, # ^ |

← Rev. 2 →

I was thinking to try something like the solution of GSS3 Link to the problem... The idea is to save the answer for a suffix and a prefix subaaray for particular segment and also saving the subarray with maximum( sum* length) for that segemnt. Then keep merging them to get the answer for the whole array. I implemented a solution using segment trees (Link).

You can check on some testcases. I don't know if its completely correct. Just thought to share the idea :P . Maybe it can be implemented without segment trees because we don't need to save the answer for every node.

→ Reply

rnsiehemt

8 years ago, # ^ |

But what do you store for the prefix and suffix? The entire difficulty in the sum * length problem is that there isn't 1 optimal suffix/prefix.

→ Reply

s3ct4l-r3x

8 years ago, # ^ |

← Rev. 2 →

If the values compared in the merge() function are equal then any of them could produce a larger answer later on while building a tree. I didn't think of it before. It might be the problem in this approach.

→ Reply

rnsiehemt

8 years ago, # ^ |

But saving the suffix and prefix with maximum sum * length is not necessarily correct.

Consider the case where the left 'subtree' is -1 -1 -1 -1 and the right 'subtree' is 100 100 100 100.

The suffix of the left subtree with greatest sum * length is the empty suffix with sum * length = 0. But the optimal solution is to take the entire string since the sum in the right subtree outweighs the negative sum in the left subtree. (And you can't fix this by adding it as a special case, the optimal suffix in the left subtree could be any length.)

→ Reply

Shapo

8 years ago, # ^ |

For array

- 2, 3, 7, - 8, - 3, 1, 1, 7, - 2, - 3

it produces answer 36, whereas correct answer is 56.

→ Reply

s3ct4l-r3x

8 years ago, # ^ |

So I think we got the problem here.. thanks gendelpiekel and Shapo for taking the time to read the code. This might be the problem in this approach.

→ Reply

rnsiehemt

8 years ago, # ^ |

← Rev. 4 →

+35

I have a solution to propose but I'm sure I made mistakes everywhere. Probably I even made a mistake immediately and this makes no sense.

Firstly, we can restate the problem. Consider the graph (as in x-axis y-axis graph, not nodes + edges graph) of the cumulative sums. We wish to find the greatest (positive) rectangular area between two points.

Secondly, consider two points which we are considering to use as the 'bottom left' corner of the rectangle, call them P = (x0, y0), Q = (x1, y1). Without loss of generality (wlog) let us say x1 > x0.

Observe firstly that if x1 >= x0 and y1 >= y0, we will always choose P over Q. So we can say y1 < y0. I.e. we only need to consider a set of points that become lower as you move right.

Now, which points T = (x, y) are there such that we wish to choose P over Q? I claim they are the points above the line going through (x0, y1) and (x1, y0). Diagram (we should choose P if T is above the red line, otherwise we should choose Q):

In fact since we only want positive rectangles, the red line should not be a line but a RAY

Now things get very hand-wavy and probably wrong or suboptimal.

Let's call the red line the 'optimality line' of P and Q.

Now we wish to solve the problem: given some potential 'bottom-left' points (call them 'candidates'), which is the best to use for some given query points? We can solve the problem for 2 candidates, but how do we solve it for more than 2?

Consider 3 candidates. Call them P, Q, R in order of increasing x (remember that this means they are also in order of decreasing y). The optimality lines between P and Q, P and R and Q and R will all interesect at the same point. Example diagram (P = (0,4), Q = (2,3), R = (4,0):

something something RAY something

Purple, green and orange areas are where R, Q and P are optimal respectively.

I believe (*wave hands*) that it should be a similar case for any 3 candidates (*wave hands more*). So for any 3 candidates, we should be able to calculate the x coordinate at which the middle candidate becomes redundant.

So now, we can sweep from left to right to answer queries. As we move, we calculate when each point will become redundant and remove them as necessary. (*wave hands vigorously*)

To answer the queries themselves, we can binary search over the currently valid candidates (in order of the y values where they are optimal).

blah blah something something it seems O(n lg n) maybe. (edit actually maybe n lg^2 n, who knows) (more edit: actually probably still n lg n)

Okay I'm going to sleep now. I missed quite a few details and this is pretty poorly written because I had to change my solution as I realised some parts were wrong. Tomorrow I will probably wake up, read this and realise I'm totally insane >_>

edit: 1) actually maybe you need a set to store candidates since you (might?) have to remove from the middle (?) so it would be O(n lg^2 n), 2) there's lot's of implicit details about keeping the candidates in order of decreasing y value, blah blah details something something

mote edit: actually binary search is separate from set operations so probably still n lg n (okay I'm actually going to sleep now)

→ Reply

Shapo

8 years ago, # ^ |

← Rev. 2 →

you slightly missed areas of optimality: green area will be not in bottom, but in opposite direction

* Assume we have only bottom-left candidates (we can calculate in O(N) using stack).
* Assume that for every 3 consecutive candidates optimality lines aren't parallel (if not, we can completely remove middle candidate, because it won't be optimal anywhere). Then, if we can prove that for every 3 candidates optimality lines intersect in one point (Sympy suspects it is true), we can prove by induction that for every set of candidates their pairwise optimality lines intersect in one point. Therefore, the whole plane divided into sectors, and every sector corresponds to one candidate, which is optimal here.
* Moreover, we can compute this division in O(K), where K — number of candidates, because every sector is defined by two optimality lines between one candidate and its neighbours (for first and last candidate we pick last and first one, respectively).
* After that, each query processed in $\text{[math]}$ : you just simply find sector containing query poing using binary search.
~~So, we have solution in $\text{[math]}$ .~~

As for other part — nevermind, it's bullshit about induction

→ Reply

rnsiehemt

8 years ago, # ^ |

+10

Hmm... I'm pretty sure the green area is correct in my diagram. For example consider T = (4, 4), it is clear we should choose Q = (2, 3) as our bottom left point.

But you're right, there are cases where the 'green' area is above instead of below. Specifically: the green area is below when Q is above (the line) PR, the green area is above when Q is below PR. Also, as you pointed out, there are cases where the lines actually don't intersect, which is when Q is on PR.

So in fact there are cases where we should add candidates as we sweep from left to right, not just remove (and cases when we don't add or remove the candidate at all).

Also, I think that we might actually not be able to consider only the two adjacent points when calculating when to add or remove a point. Probably we can't without more observations and maybe not at all.

So actually, perhaps we can't do better than sqrt decomposition and the final solution is quite similar (or perhaps identical) to the half plane solution you described.

→ Reply

Shapo

8 years ago, # ^ |

Yes, you're absolutely right about green area. I found mistake in my thoughts.

→ Reply

mmaxio

8 years ago, # ^ |

← Rev. 2 →

+16

Isn't this idea like convex hull trick in 3D? For each j we want to find i < j maximizing (j - i)(p_i - p_j) = jp_j - ip_j - p_ij + ip_i; jp_j is constant, so we have a collection of planes z = - ix - p_iy + ip_i and need to find the topmost of them for (x, y) = (j, p_j) each time. It looks hard in general case.

→ Reply

Shapo

8 years ago, # ^ |

+18

Disclaimer: It took slightly complicated :) I don't think it's the best solution ever, moreover I can miss something. So, comments and questions are welcome!

Assume we have array of n integer numbers indexed 1 trough n. Let's denote A_i as i-th number in this array.
First of all, calculate array of prefix sums named S, so that $\text{[math]}$ . Now we try to solve the following problem equivalent to the original one:

$\text{[math]}$

Now let's make a hypothesis that we have a data structure, which operates on points in 2-dimensional space and allows to:

Preprocess K points (x_i, y_i) in $\text{[math]}$ .
For each of Q arbitrary points (a_j, b_j) computes $\text{[math]}$ in amortized $\text{[math]}$ .

Given such a data structure, one can use SQRT-decomposition:

Divide array of partial sums S into blocks with $\text{[math]}$ consecutive elements in each, treating partial sums as points (S_i, i) in 2-dimensional space;
For each block build data structure mentioned above, overall $\text{[math]}$ ;
Calculate answer for partial sums from the same block in a naive way, overall $\text{[math]}$ ;
To account for partial sums from different blocks, for each block consider partial sums from previous blocks as queries to data structure, overall $\text{[math]}$ .

In total we have $\text{[math]}$ complexity.

As for data structure, I'd like only to mention the key ideas:

For fixed set of K points (x_i, y_i) we can divide plane no more than into K convex polygons (possibly unbounded), each part corresponds to points where we have fixed argmax. This can be done in $\text{[math]}$ using algorithm for intersecting halfplanes.
For processing one query we only need to figure out the polygon this query lies inside. This can be done using ideas from wikipedia

→ Reply

mnbvmar

8 years ago, # |

← Rev. 2 →

+26

Okay, I think I have an $\text{[math]}$ solution.

Let's make a binary search on the answer. Now we want to know if there are indices L, R so that (a_L + 1 + ... + a_R)(R - L) ≥ M for fixed M. Set S_k = a₁ + ... + a_k, then the inequality above becomes (S_R - S_L)(R - L) ≥ M. Solving for S_R, we get:

$\text{[math]}$

Let's make a set of functions f₁, ..., f_n so that $\text{[math]}$ for x > L. Now we want to see if there are any L, R for which f_L(R) ≤ S_R, that is, when we set f(x) = min f_i(x), if there is any R such that f(R) ≤ S_R.

How to compute f? Notice that each f_L is a hyperbola. We can easily see that:

no two functions intersect in more than one point in their domain (solve equation f_i = f_j, then see).
we want to make a lover envelope of such hyperbolas (hyperbola parts). Funnily enough, all of them are exactly the same, though translated.
say $\text{[math]}$ , $\text{[math]}$ . If a < c, then f_i is at first larger than f_j, then intersects f_j and is smaller forever — thus, if it is a global minimum, it must come after f_j. Same if c > a. However, if a = c, compare b and d. If b < d, the first hyperbola is always before the second. (Probably some picture would be handy here...)
We can make a total ordering basing on the preceding rules. Then fire up a linear-time lower-envelope "convex-hull" algorithm in the same way we did for lines.

(It is a bit messy, but I gave you an idea). The running time is now $\text{[math]}$ , however we can sort everything only in the beginning of the runtime. This brings down the running time to what I stated.

→ Reply

mmaxio

8 years ago, # ^ |

← Rev. 2 →

nvm

→ Reply

mnbvmar

8 years ago, # ^ |

There were some inequalities reversed, it should be okay now.

→ Reply