I recently came across an interesting problem and I'm unable to figure out the exact solution. The problem goes like this:
There are n cities on a circular number line and m possible locations for parks on the same line. We have to choose a subset of parks that the following conditions are met.
- Every city has a park within 100m distance
- The number of cities with more than one park within 100m distance is minimized
By circular number line, I mean that the end of the number line connects back to the beginning.
What would be an optimized algorithm. My first instinct was to represent the parks as intervals covering the individual cities. Effectively, we have to find a subset of interval such that every point is covered but number of points with overlapping intervals is minimized.
I assumed a Dynamic Programming solution would be necessary, but I'm unable to figure out the exact recursive equation. Any help would be appreciated!.
Consider this more of an job interview problem and not a coding contest question.
Auto comment: topic has been updated by rajb_957 (previous revision, new revision, compare).
I tried to apply this question on a simple number line, not a circular one. Consider the following approach, where I extend upon your idea of taking the cities as uniformly distributed points and parks as intervals covering these points.
Let dp[i] denote the minimum number of overlaps taking a subset of intervals, such that all points <= i are covered and no points beyond i are covered. If such a case does not exist, dp[i] is set to infinity
In the beginning initialize all dp[i] to infinity except for dp[0] which is set to 0
Sort the intervals in order of starting or ending, it doesn't really matter here as we can show any interval with greater start time will have greater or equal end time.
For each interval with start a and end b: We can set dp[b] = min(dp[a], 1+dp[a+1], 2+dp[a+2], 3+dp[a+3]... dp[b])
Time complexity is O(nm)
Can someone please suggest a method to apply this concept for a circular number line?
Let's first consider the non-circular version of the problem:
You have $$$m$$$ ranges of length $$$l=2 \cdot 100$$$ and $$$n$$$ points. You need to cover all points using some of the ranges, such that you minimize the number of points covered by more than one range.
Notice that in the optimal solution, there should be no range $$$i$$$ such that it is completely included in the other ranges (you can remove it and the solution is never worse). Therefore, it turns out that in the optimal solution no point is covered by more than two ranges.
Then, it seems that we can relax the problem, where you have $$$m$$$ ranges and $$$n$$$ points, and you need to cover all points with some of the ranges, but for range $$$i$$$ you pay a cost equal to the number of points it covers (you have to minimize the sum of costs). Notice that this problem overestimates the "real" cost, however when considering the optimal solution these costs coincide (up to a difference of $$$n$$$).
Then, for the non-circular version you have the obvious $$$O(m)$$$ dp solution: let $$$dp(i)$$$ be minimum cost for a solution that covers the prefix of points and the last range is $$$i$$$. Then, this $$$dp$$$ requires minimum over a sliding window, for which there are techniques to do in linear time (see monotonic queue).
The above solution does not work for the original circular version, because when choosing the first range, some suffix of points will also be solved. However, we can do the following "trick":
Let's consider the $$$i$$$-th range. If we include it in the solution, then the other ranges must be disjoint from $$$i$$$ (otherwise, $$$i$$$ would be irrelevant, as discussed before). Then, what one can do is:
If all park locations and all cities have integer locations, you can reduce to at most $$$l=200$$$ such dp computations, making it $$$O((m + n) \cdot l)$$$.
Thank you for the detailed answer. I tried to understand your logic, please let me know if I am correct.
For the non circular problem, I took dp[i] as the minimum cost for the first i points and not ranges. And calculated this dp by iterating through ranges from left to right. I got an O(lm) solution but I am not entirely sure how to use a montonic queue to reduce it to O(m).
Secondly, you have said that if the ith range is included in the solution, it must be disjoint from other ranges. But it can overlap right? The only condition is that a point must not have more than two overlapping intervals in the optimal soln. Can you please explain this?
Ypur dp definition also works. To compute $$$dp[i]$$$ you need to interate over some range $$$j$$$ such that $$$y[j] - 100 \leq x[i] \leq y[j] + 100$$$ and choose the minimum $$$dp[last[j]] + cost[j]$$$ (where $$$last[j]$$$ is the point just before $$$l[j] - 100$$$). Consider $$$v[j] = dp[last[j]] + cost[j]$$$. Then, the problem asks you to compute the minimum in $$$v$$$ over ranges that go ‘towards the right’ in a sliding window fashion, therefore if you can implement a queue that supports querying the minimum, you can solve the problem faster than just iterating.
I'm a bit confused about the circular "trick" used here. From what I understand, you choose an arbitrary interval i, and remove it, causing a linear range from the intervals i+1, i+2... m 1 2 3... i-1. Now you are excluding the points covered by range i and solving the remaining points using the same method as above. But I don't understand how this will lead to the solution. What is the significance of the last dp value? And how did you arrive at the final time complexity? Can you please elaborate these steps?
The last dp value would mean the dp value that takes the last range (by definition). The last range here is $$$i$$$. This allows you to safely discard all points that range $$$i$$$ covers (not their impact to the range costs, though). Now, all remaining points can’t be covered via a possible wrap around, so you can safely consider the problem non-circular.
To arrive at the final complexity, take an arbitrary point (say, the first one). There are at most $$$l$$$ ranges that cover it. Solve the above instance for each of those ranges (surely, one of them must be in the solution, and thus you treat all cases).
Okay, I understood now, thanks for the explanation! (Great solution by the way.)
Also i tried to code your solution for the non circular version of the problem. It works can you suggest me a way by which i can backtrack the solution. Currently i am getting the least cost that will be taken.