retrograd's blog

By retrograd, history, 7 months ago, In English,

I wanted to showcase a simpler algorithm for the closest pair of points in 2D problem, and maybe to discuss its performance / countercases.

The problem is:

Given N distinct points in Euclidean 2D space, compute the minimum (squared) distance between any two distinct points.

The usual approach here is a divide-and-conquer algorithm, which can be found virtually anywhere, including on Wikipedia. The complexity of this algorithm is O(nlog(n)), but it is rather tricky to achieve this complexity.

The alternative approach (based on the same algorithm), is to do sweep-line. We sort the points based on the x-coordinate and we keep a set of the points in the region x - d, x, sorted by y coordinate. Here d is the smallest distance so far (we do that with the two-pointers technique). Now, for each new point x, y, we query the whole range y - d, y + d in this set and possibly update our answer.

Due to the proof of the D&C algorithm, at each time the quieried range should be of size O(1) on average, so total complexity would be O(nlog(n)). Code is below:

long long ClosestPair(vector<pair<int, int>> pts) {
    int n = pts.size();
    sort(pts.begin(), pts.end());
    set<pair<int, int>> s;

    long long best_dist = 1e18;
    int j = 0;
    for (int i = 0; i < n; ++i) {
        int d = ceil(sqrt(best_dist));
        while (pts[i].first - pts[j].first >= best_dist) {
            s.erase({pts[j].second, pts[j].first});
            j += 1;
        }

        auto it1 = s.lower_bound({pts[i].second - d, pts[i].first});
        auto it2 = s.upper_bound({pts[i].second + d, pts[i].first});
        
        for (auto it = it1; it != it2; ++it) {
            int dx = pts[i].first - it->second;
            int dy = pts[i].second - it->first;
            best_dist = min(best_dist, 1LL * dx * dx + 1LL * dy * dy);      
        } 
        s.insert({pts[i].second, pts[i].first}); 
    }
    return best_dist;
}

What do you think? Are there any special cases (e.g. many points at the same x coordinate) that would break the solution?

Read more »

 
 
 
 
  • Vote: I like it  
  • +43
  • Vote: I do not like it  

By retrograd, history, 14 months ago, In English,

This article will be presenting a rather classical problem that can be solved using deques, along with an extension that allows you to solve the problem in its more general multi-dimensional case. I have decided to write this article after this discussion on 2D range-minimum query.

The article will be mainly based on this following problem:

You are given an array of numbers A[] of size n and a number k ≤ n. Find the minimum value for each continuous subarray of size k.

We will be now focusing on the linear-time solution to this problem.

Solution:

Consider sweeping from left to right through the array. At every moment we keep a list of "candidates" for minimum values throughout the process. That means that at each moment, you have to add one element to the list and (potentially) remove one element from the list.

The key observation is that, during the sweep line process, we find two values A[i] and A[j] which have i < j and A[i] ≥ A[j], then we can safely discard A[i]. That is because, intuitively, A[j] will continue to "live" in our sweep line more than A[i], and we will never prefer A[i] instead of A[j].

We should now consider pruning all the "useless" values ("useless" as in the statement above). It is easy to see now that doing this will lead to a strictly increasing list of candidates (why?). In this case, the minimum will always be the first element (O(1) query).

In order to insert an element to the back of the pruned candidate list, we will do a stack-like approach of removing all elements that are greater than it, and to erase on element, we just pop the front of the list (if it is not already removed).

This is a well-known approach for finding minima over fixed-size continuous subarrays. I will now present an extensions that allows you to do the same trick in matrices and even multi-dimensional arrays.

The multi-dimensional extension

Problem (2D):

You are given an matrix of numbers A[][] of size n × m and two numbers k ≤ n, l ≤ m. Find the minimum value for each continuous submatrix of size k × l.

Solution:

Consider the matrix as a list of rows. For each row vector of A, use the 1D algorithm to compute the minimum value over all l-length subarrays, and store them in ColMin[][] (obviously, ColMin[][] is now a n × (m - l + 1)-sized matrix).

Now, consider the new matrix as a list of columns. For each column vector of ColMin, use the algorithm to compute the minimum value over all k-length subarrays, and store them in Ans[][] (of size (n - k + 1) × (m - l + 1)).

The Ans[][] is the solution to our problem.

The following picture shows the intutition behind how it works for computing Ans[1][1] for n = 5, m = 7, k = 3, l = 4

The pseudocode is as follows:

def solve_2d(M, k, l):
  column_minima = {} # empty list
  for each row in M.rows:
    # We suppose we have the algorithm that solves
    # the 1D problem
    min_row = solve_1d(row, l)
    column_minima.append_row(min_row)
  
  ans = {}
  for each col in column_minima.cols:
    min_col = solve_1d(col, k)
    ans.append_col(min_col)
  
  return ans

Note that the pseudocode is (deliberately) hiding some extra complexity of extracting rows / columns and adapting the 1D algorithm to the 2D problem, in order to make the understanding of the solution clearer.

The total complexity of the algorithm can be easily deduced to be O(n * m)

Multi-dimensional case analysis

The solution can be extended to an arbitrary order of dimensions. For a d-dimensional matrix of size s1, s2, ..., sd, the time-complexity of the problem is O(d * s1 * ... * sd), and the memory complexity is O(s1 * ... * sd). This is much better than other algorithms that do the same thing on non-fixed size submatrices (e.g. multi-dimensional RMQ has O(s1 * ... * sd * log(s1) * ... * log(sd)) time and memory complexity).

Finding the best k minima

The deque approach itself is limited in the sense that it allows you to find only the minimum value over the ranges. But what happens if you want to calculate more that one minimum? We will discuss an approach that I used during a national ACM-style contest where we were able to calculate the best 2 minima, and then argue that you can extend to an arbitrary number of minimum values.

In order to store the lowest 2 values, we will do the following:

Keep 2 deques, namely D1 and D2. Do a similar algorithm of "stack-like popping" on D1 when you add a new element, but instead of discarding elements from D1 when popping, transfer them down to D2 and "stack-like pop" it.

It is easy to see why the lowest 2 elements will always be in one of the two deques. Moreover, there are only 2 cases for the lowest two elements: they are either the first two elements of D1, or the first elements of D1 and D2 subsequently. Checking the case should be an easy thing to do.

The extension to an arbitrary number of minima is, however, not so great, in the sense that the complexity of this approach becomes O(n * k2) for a n-sized array, currently bottlenecked by the number of elements you have to consider in order to find the first k minima. [Maybe you can come up with a cleverer way of doing that?]

Useful links

This is the problem I referred to above: http://www.infoarena.ro/problema/smax. I recommend trying to think it through and implementing it, and translating the statement via Google Translate or equivalent.

Read more »

 
 
 
 
  • Vote: I like it  
  • +349
  • Vote: I do not like it  

By retrograd, history, 3 years ago, In English,

Hello! I am a fairly advanced programmer (although I'm pretty new), and I know the basic algorithms and data structures used for most problems. However, I don't know how to get better from this state onwards.

I should say that my strength is mainly DS problems. Greedy, D&C, DP I do pretty well (once I recognize a specific-type problem), constructive algorithms seem very hard to me (for example link). For DS, I know pretty much all there is to know about segment trees, BIT, sqrt-decomposition (I call it a DS, don't blame me :D), BSTs, hash, etc (the basic ones), although problems that involve advanced tricks with these (e.g. persistent segment trees, lazy propagation) seem very appealing.

I want to prepare mysef for this year's ACM-ICPC contest, as it is my first year in Uni. I have been learning algo intensively since December.

I would be grateful if you know any good lists of problems that are really crucial and/or teach useful new techniques, as I am stuck momentarily. I have read the results on Google for ACM algorithms and such, so a personal response would be a lot more appreciated :).

I will also try to keep a list updated with interesting problems and techniques, in case other people struggle with the same issue. Maybe this tread will become a learning portal for more people :D.

LIST:

  • (http://e-maxx.ru/algo/) -> there are a lot of algorithmic problems greatly explained, as well as tricks and great algorithms -- the website is in Russian, although it can be translated using Google Translate -- great resource

Thank you a lot!

Read more »

 
 
 
 
  • Vote: I like it  
  • +14
  • Vote: I do not like it