zetaminusone's blog

By zetaminusone, history, 5 weeks ago, In English

Let there be a sequence of integers $$$x_1, x_2, ... x_n$$$. Then, the harmonic mean is defined as

$$$\frac{n}{\sum_{i=1}^n \frac{1}{x_i}}$$$

Given an array of positive and negative integers, what is the most efficient way to find the subarray with the maximum harmonic mean? Thanks!

 
 
 
 
  • Vote: I like it
  • +13
  • Vote: I do not like it

»
5 weeks ago, # |
  Vote: I like it +1 Vote: I do not like it

What is the source of this problem? Are there any other constraints?

  • »
    »
    5 weeks ago, # ^ |
      Vote: I like it +3 Vote: I do not like it

    This question came as a random thought as I was trying to create more questions for my school's judge. It is purely a matter of interest ^^

    • »
      »
      »
      5 weeks ago, # ^ |
        Vote: I like it +4 Vote: I do not like it

      Wouldn't it be just the maximum number in array?

    • »
      »
      »
      5 weeks ago, # ^ |
        Vote: I like it +9 Vote: I do not like it

      OK. That explains my weird feeling reading the problem. With no further constraints, there can be division by zero in calculating the harmonic mean of some subarrays due to negative inputs, for example with 1 -1. Even if input guarantees that this cannot happen or it is clarified that this doesn't count, the correct output can be large when the sum in the denominator only approximately cancels out. The correct output for 8 -9 -9 10 is 1440. If there's a particularly efficient way to search for such near-cancellations, it's at least not obvious to me.

      If instead all input is positive, then (unless you want the problem to be 'print the maximum of the input' in disguise) some constraint on the size of the subarrays to be considered seems appropriate.

      • »
        »
        »
        »
        5 weeks ago, # ^ |
          Vote: I like it +3 Vote: I do not like it

        Thank you! I honestly had not realised at first and you pointed out many insightful observations. That resolves the problem :)

»
5 weeks ago, # |
Rev. 2   Vote: I like it +9 Vote: I do not like it

There is a stupid systematic way in $$$O(N \log^3 N)$$$. Assume without loss of generality that there is at least one positive element.

Let $$$a_i = \frac 1{x_i}$$$ and $$$p_i$$$ be $$$\sum_{j = 1}^i a_j$$$ (prefix sum array of $$$a_i$$$).

We wish to pick two endpoints $$$l$$$ and $$$r$$$ such that $$$\frac{r - l + 1}{\sum_{i=l}^r \frac 1{x_i}}$$$ is maximized.

$$$ \begin{align} &\phantom{=.}\frac{r - l + 1}{\sum_{i=l}^r \frac 1{x_i}} \\ &= \frac{r - (l - 1)}{\sum_{i=(l - 1) + 1}^r \frac 1{x_i}} \\ \end{align} $$$

Let $$$l_0 = l - 1$$$,

$$$ \begin{align} &= \frac{r - l_0}{\sum_{i=l_0 + 1}^r \frac 1{x_i}} \\ &= \frac{r - l_0}{p_r - p_{l_0}} \\ \end{align} $$$

We wish to maximize this value. We note for later that $$$p_r - p_{l_0}$$$ must be positive. We binary search on $$$k$$$ to find how many $$$0 \leq l_0 \leq r \leq N$$$ satisfy

$$$ \begin{align} &= \frac{r - l_0}{p_r - p_{l_0}} \geq k \\ &= r - l_0 \geq p_rk - p_{l_0}k \\ &= r - p_rk \geq l - p_{l_0}k \\ \end{align} $$$

Let $$$b_i = i - p_ik$$$, then we want to check how many pairs $$$0 \leq l_0 \leq r \leq N$$$ satisfy $$$b_r \geq b_l$$$ AND $$$p_r > p_{l_0}$$$. This is a 3D partial order problem which can be solved using CDQ in $$$O(N \log^2 N)$$$. (Tutorial)

For each iteration of the binary search, if no pairs $$$l_0, r$$$ are found, then $$$k$$$ is too high, otherwise, it's at least that value of $$$k$$$.

Floating point accuracy is likely a big issue. The case where all elements are negative can be handled in a similar way.

  • »
    »
    5 weeks ago, # ^ |
    Rev. 2   Vote: I like it +1 Vote: I do not like it

    Binary searching over $$$k$$$ probably requires some care, especially when you seem to claim $$$O(\log n)$$$ iterations as opposed to something like $$$O(\log \frac{1}{\varepsilon})$$$ iterations. The former is certainly achievable, for example by modifying your CDQ queries to provide a random sample and using that $$$(l_0, r)$$$ pair's highest possible value of $$$k$$$ as your next pivot, but isn't entirely obvious.

    Floating point accuracy is a very big issue. Unless the bounds are very small, I would prefer to do all arithmetic for this problem in exact arithmetic rather than risk running headfirst toward the limits of what floating-point types the compiler might provide. Obviously this makes arithmetic operations rather expensive, but this seems to be a necessary price to pay. I was already able to craft a testcase that will run afoul of the precision available in a double with $$$n = 8$$$ and $$$| a_i | \leq 1000$$$:

    When a = [1, 1, -979, -846, -729, 761, 884, 885], $$$p_8 = 2 + \frac{1}{6656 823 096 297 660} > p_2 = 2$$$, but when computed in doubles in the obvious way on my machine, $$$p_8 = p_2$$$.

    Come to think of it, for a fixed $$$l_0$$$, the largest harmonic mean is achieved by minimizing the slope $$$\frac{p_r - p_{l_0}}{r - l_0}$$$ among points $$$r > l_0$$$ with $$$p_r > p_{l_0}$$$, which is a convex-hull query, and can be answered in $$$O((\log n)^2)$$$ arithmetic operations per query by maintaining a segment tree of dynamic (lower) convex hulls and adding points $$$(x, p_x)$$$ in decreasing order of $$$p_x$$$.