Loserinlife's blog

By Loserinlife, history, 8 months ago, In English

Define the distance between two points i, j max(|xi, xj|, |yi — yj|). Given n points (xi, yi). Find a point (X, Y) so that the sum of distances from this point to n points is minimized.

N <= 1e5

xi, yi <= 1e9

Thanks in advance.

  • Vote: I like it
  • +8
  • Vote: I do not like it

»
8 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by Loserinlife (previous revision, new revision, compare).

»
8 months ago, # |
Rev. 3   Vote: I like it +3 Vote: I do not like it

UPD: I understood the problem incorrectly. Please check the later comments in this thread.

For each point $$$i$$$, let $$$a_i = x_i + y_i$$$ and $$$b_i = x_i - y_i$$$.

Let $$$d(i, j) = max(|x_i - x_j|, |y_i - y_j|)$$$.

Without loss of generality, assume $$$x_i \ge x_j$$$. Now $$$|x_i - x_j| = x_i - x_j$$$. And

$$$d(i, j) = max(|x_i - x_j|, |y_i - y_j|)$$$

$$$d(i, j) = max(x_i - x_j, |y_i - y_j|)$$$.

$$$\displaystyle d(i, j) = \frac{x_i - x_j + |y_i - y_j| + |x_i - x_j - |y_i - y_j||}{2}$$$ (According to this)

$$$2\cdot d(i, j) = x_i - x_j + |y_i - y_j| + |x_i - x_j - |y_i - y_j||$$$

Case 1: $$$y_i \ge y_j$$$

$$$2\cdot d(i, j) = x_i - x_j + y_i - y_j + |x_i - x_j - y_i + y_j|$$$

$$$2\cdot d(i, j) = x_i + y_i - x_j - y_j + |x_i - y_i - x_j + y_j|$$$

$$$2\cdot d(i, j) = a_i - a_j + |b_i - b_j|$$$

Because $$$x_i \ge x_j$$$ and $$$y_i \ge y_j$$$ are assumed,

$$$a_i \ge a_j$$$ must be true.

$$$2\cdot d(i, j) = |a_i - a_j| + |b_i - b_j|$$$

$$$\displaystyle d(i, j) = \frac{|a_i - a_j| + |b_i - b_j|}{2}$$$

Case 2: $$$y_i < y_j$$$

$$$2\cdot d(i, j) = x_i - x_j + (- y_i + y_j) + |x_i - x_j - (- y_i + y_j)|$$$

$$$2\cdot d(i, j) = x_i - x_j - y_i + y_j + |x_i - x_j + y_i - y_j|$$$

$$$2\cdot d(i, j) = x_i - y_i - x_j + y_j + |x_i + y_i - x_j - y_j|$$$

$$$2\cdot d(i, j) = b_i - b_j + |a_i - a_j|$$$

Because $$$x_i \ge x_j$$$ and $$$y_i < y_j \Leftrightarrow -y_i > y_j$$$ are assumed,

$$$x_i - y_i > x_j - y_j$$$

$$$b_i > b_j$$$ must be true.

$$$2\cdot d(i, j) = |b_i - b_j| + |a_i - a_j|$$$

$$$\displaystyle d(i, j) = \frac{|b_i - b_j| + |a_i - a_j|}{2}$$$

We can see that in both cases, $$$\displaystyle d(i, j) = \frac{|a_i - a_j| + |b_i - b_j|}{2}$$$.

This is equal to the manhattan distance of points $$$(a_i, b_i)$$$ and $$$(a_j, b_j)$$$ divided by two. For implementation, we can actually forget about the "divided by two" part, since that just scales everything by a constant and doesn't affect which distances are larger than others.

Now the problem is to find the minimum sum of manhattan distances for each point. Manhattan distances for each coordinate are independent of each other and thus they can be calculated independently.

Let's consider some fixed point $$$i$$$ and the sum of distance differences of the $$$a$$$-coordinates:

$$$\displaystyle\sum_{j = 1}^n |a_i - a_j|$$$

$$$ = \displaystyle\sum_{a_j \le a_j} (a_i - a_j) + \displaystyle\sum_{a_j > a_j} (a_j - a_i)$$$

$$$ = \displaystyle\sum_{a_j \le a_j} a_i - \displaystyle\sum_{a_j \le a_j} a_j + \displaystyle\sum_{a_j > a_j} a_j - \displaystyle\sum_{a_j > a_j} a_i$$$

$$$ = [\text{count of } j \text{: } a_j \le a_i] \cdot a_i - \displaystyle\sum_{a_j \le a_j} a_j + \displaystyle\sum_{a_j > a_j} a_j - [\text{count of } j \text{: } a_j > a_i] \cdot a_i$$$

$$$ = \displaystyle\sum_{a_j > a_j} a_j - \displaystyle\sum_{a_j \le a_j} a_j + [\text{count of } j \text{: } a_j \le a_i] \cdot a_i - [\text{count of } j \text{: } a_j > a_i] \cdot a_i$$$

The first and second terms are just prefix/suffix sums on the sorted array of all $$$a$$$-coordinates which can either be precalculated or calculated during iteration of the array. The last two terms can be calculated in $$$O(1)$$$ based the position of point $$$i$$$ in the sorted array of all $$$a$$$-coordinates. Thus, these values can be calculated for all points $$$i$$$ in $$$O(n \log n)$$$ (due to sorting the points). Now, we can do the same for all $$$b$$$-coordinates and calculate the sum of differences of those. The total manhattan distance, which is equal to twice the original distance, is now the sum of these two sums of differences. Now we can find the minimum total distance in $$$O(n)$$$.

  • »
    »
    8 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Woah, the transformation is really cool! Tysm.

    But for the original problem, point (X, Y) can be any point. So the point would be the median of array A and B respectively, right?

    • »
      »
      »
      8 months ago, # ^ |
      Rev. 2   Vote: I like it +3 Vote: I do not like it

      If we were only considering array $$$a$$$, the optimal one would be the median. Same for array $$$b$$$. But the distance function is the sum of the differences to all values in both arrays. The optimal point is not nescessarily the median of array $$$a$$$ or $$$b$$$, it's the one with the smallest sum of the sums of differences in array $$$a$$$ and $$$b$$$. I can give you an example in a moment.

      UPD: Example

      Consider three points given in input: $$$(x_1, y_1) = (1, 4)$$$, $$$(x_2, y_2) = (6, 8)$$$ and $$$(x_3, y_3) = (6, 0)$$$.

      These points get transformed into $$$(a_1, b_1) = (5, -3)$$$, $$$(a_2, b_2) = (14, -2)$$$ and $$$(a_3, b_3) = (6, 6)$$$.

      Sorting these points according to their $$$a$$$-coordinate will lead to the following:

      point index:         1   3   2
      a-coordinate:        5   6  14
      sum of a-distances: 10   9  17
      

      Sorting these points according to their $$$b$$$-coordinate will lead to the following:

      point index:         1   2   3
      b-coordinate:       -3  -2   6
      sum of b-distances: 10   9  17
      

      Sum of distances for both coordinates:

      point index:           1   2   3
      sum of a-distances:   10  17   9
      sum of b-distances:   10   9  17
      sum of all distances: 20  26  26
      

      We can see that while point $$$2$$$ was the median when sorting $$$a$$$-distances and point $$$3$$$ was the median when sorting $$$b$$$-distances, point $$$1$$$ still has the lowest total sum of distances. This shows that we need to calculate the sum of $$$a$$$-distances and the sum of $$$b$$$-distances for every point and take the one with the smallest sum.

      And if you want to confirm that this method works, we can check the distances using the metric mentioned in the problem statement.

      Distance between $$$1$$$ and $$$2$$$: $$$\max(|1 - 6|, |4 - 8|) = \max(5, 4) = 5$$$

      Distance between $$$1$$$ and $$$3$$$: $$$\max(|1 - 6|, |4 - 0|) = \max(5, 4) = 5$$$

      Distance between $$$2$$$ and $$$3$$$: $$$\max(|6 - 6|, |8 - 0|) = \max(0, 8) = 8$$$

      Thus, the distance from $$$1$$$ to everywhere is $$$5 + 5 = 10$$$

      The distance from $$$2$$$ to everywhere is $$$5 + 8 = 13$$$

      The distance from $$$3$$$ to everywhere is $$$5 + 8 = 13$$$

      All of these values are exactly $$$\frac{1}{2}$$$ of the manhattan distances on the transformed points as expected.

      • »
        »
        »
        »
        8 months ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        I think for your example, Point (6, -2) should be optimal.

        Sorry for not making it clear but the point chosen doesn't have to be in the N initial point.

        The original problem is here: https://oj.vnoi.info/problem/icpc21_beta_k

        • »
          »
          »
          »
          »
          8 months ago, # ^ |
          Rev. 2   Vote: I like it +3 Vote: I do not like it

          Oh, I see. And I think what you said previously — "So the point would be the median of array A and B respectively" — should be correct.

          • »
            »
            »
            »
            »
            »
            8 months ago, # ^ |
              Vote: I like it 0 Vote: I do not like it

            Well a weird thing is that is isn't :((.

            Can you take a quick look at this code.

            https://ideone.com/e0m9Bh

            I may have misunderstood the problem so I will try your approach when I get home.

            • »
              »
              »
              »
              »
              »
              »
              8 months ago, # ^ |
              Rev. 3   Vote: I like it 0 Vote: I do not like it

              Ah, I think I found something. It's not a bug in your code, it's a mistake in the method.

              Since the people always move from integer coodrinates to integer coordinates, it is implied that the vaccination station must have integer coordinates. Consider the case

              4
              0 0
              0 1
              1 0
              1 1
              

              Your code outputs $$$2$$$, but that is only possible with the point $$$(0.5, 0.5)$$$. This is not allowed and thus, the answer would be $$$3$$$.

              UPD: I guess the solution would be something like this (but I'm not at all sure if this is correct, this is just by intuition):

              Start by choosing the median point as the answer. Let the point be called $$$(x, y)$$$ (note that we are in the transformed coordinate system). This point can be directly translated back with integer coordinates iff $$$x \equiv y \pmod 2$$$. If this is true, we are done.

              If $$$x \neq y \pmod 2$$$, we can try points $$$(x+1, y)$$$, $$$(x-1, y)$$$, $$$(x, y+1)$$$ and $$$(x, y-1)$$$. The best one of these should(?) be the answer.

              • »
                »
                »
                »
                »
                »
                »
                »
                8 months ago, # ^ |
                Rev. 2   Vote: I like it 0 Vote: I do not like it

                https://ideone.com/8h3IJ1

                I just try neighboring points to the median and that AC! :) Thanks so much for the help.

                Oh wait, I just realize you already mentioned it. Oops :)

                • »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  8 months ago, # ^ |
                    Vote: I like it 0 Vote: I do not like it

                  Nice. I just updated my previous comment since I got the same idea but it seems like you were faster.

              • »
                »
                »
                »
                »
                »
                »
                »
                8 months ago, # ^ |
                  Vote: I like it 0 Vote: I do not like it
                i think we can transform it to 
                  (x+y , x-y) by rotating 45 deg 
                so that dis is just 1/2(abs(x-x1) + abs(y-y1))
                
                
          • »
            »
            »
            »
            »
            »
            8 months ago, # ^ |
              Vote: I like it 0 Vote: I do not like it

            I get really confused about when to take the median and when to take the average when we have to minimize distance. Can you please clarify?

            • »
              »
              »
              »
              »
              »
              »
              8 months ago, # ^ |
              Rev. 2   Vote: I like it 0 Vote: I do not like it

              Suppose you have a set of points on a 1-dimensional line. They don't have to be at integer coordinates.

              If you need to choose a point that minimizes the maximum distance to any other point, the answer is the average of the smallest and largest points. This should be intuitively true and not too hard to prove.

              If you need to choose a point that minimizes the sum of distances to all other point, the answer is the median of all points. This is not nescessarily intuitive and it is harder to prove, but not too difficult.

              Note that both of these are always true if the distance function is the standard, i.e. $$$d(i, j) = |i - j|$$$. If the distance function is something else, these do not nescessarily apply.