By retrograd, history, 15 months ago, , I wanted to showcase a simpler algorithm for the closest pair of points in 2D problem, and maybe to discuss its performance / countercases.

The problem is:

Given N distinct points in Euclidean 2D space, compute the minimum (squared) distance between any two distinct points.

The usual approach here is a divide-and-conquer algorithm, which can be found virtually anywhere, including on Wikipedia. The complexity of this algorithm is O(nlog(n)), but it is rather tricky to achieve this complexity.

The alternative approach (based on the same algorithm), is to do sweep-line. We sort the points based on the x-coordinate and we keep a set of the points in the region x - d, x, sorted by y coordinate. Here d is the smallest distance so far (we do that with the two-pointers technique). Now, for each new point x, y, we query the whole range y - d, y + d in this set and possibly update our answer.

Due to the proof of the D&C algorithm, at each time the quieried range should be of size O(1) on average, so total complexity would be O(nlog(n)). Code is below:

long long ClosestPair(vector<pair<int, int>> pts) {
int n = pts.size();
sort(pts.begin(), pts.end());
set<pair<int, int>> s;

long long best_dist = 1e18;
int j = 0;
for (int i = 0; i < n; ++i) {
int d = ceil(sqrt(best_dist));
while (pts[i].first - pts[j].first >= best_dist) {
s.erase({pts[j].second, pts[j].first});
j += 1;
}

auto it1 = s.lower_bound({pts[i].second - d, pts[i].first});
auto it2 = s.upper_bound({pts[i].second + d, pts[i].first});

for (auto it = it1; it != it2; ++it) {
int dx = pts[i].first - it->second;
int dy = pts[i].second - it->first;
best_dist = min(best_dist, 1LL * dx * dx + 1LL * dy * dy);
}
s.insert({pts[i].second, pts[i].first});
}
return best_dist;
}


What do you think? Are there any special cases (e.g. many points at the same x coordinate) that would break the solution?  Comments (4)
 » same idea here
 » There is an explanation of the same method :) https://olympiad.cs.uct.ac.za/presentations/camp1_2009/linesweep.pdf
 » 5 months ago, # | ← Rev. 3 →   i think the worst case of this implementation is O(n2) for example the points : first point ==> (0, pow(2,n) — pow(2,1)) second point ==> (0, pow(2,n) — pow(2,2)) . . i-th point ==> (0, pow(2,n) — pow(2,i) ) . .Am i right ?
•  » » No. It should only test adjacent pairs in this case. Note that when he inserts the points in the set he swaps x/y.