Блог пользователя ko_osaga

Автор ko_osaga, история, 8 лет назад, По-английски

The traditional implementation of union find tree (aka disjoint set) utilizes rank compression — this is the code with it.

int parent[1000], rank[1000];

int find(int x){
    return parent[x] = (parent[x] == x ? x : find(parent[x]));
}

void Union(int p, int q){
    p = find(p);
    q = find(q);
    if(rank[p] < rank[q]) parent[p] = q;
    else parent[q] = p;
    if(rank[p] == rank[q]) rank[p]++;
}

But in competitive programming, I found out that the rank compression was useless. Without rank compression, the code was shorter, and even the program ran faster. This is my implementation without rank compression.

int parent[1000];

void Union(int p, int q){ parent[find(p)] = find(q); }

and for some cases, I found implementing rank compression is quite hard (maybe impossible).

I will give you a sample problem : "Given a tree T, you should process Q queries — "paint the edge's value into x (>0) from path s to e".

I reversed the query, broke down the path s — e to s — LCA(s,e) — e, and used this algorithm to paint the tree. (root_val[s] : value of edge s — parent(s), dep[s] = depth of s. nxt[s] = parent(s))

int paint(int s, int e, int x){  
    if(dep[s] <= dep[e]) return s;  
    if(!root_val[s]) root_val[s] = x;  
    return nxt[s] = paint(nxt[s],e,x);  
}

All of those program ran very fast in practice, but I was curious about the worst-case runtime of such path compression without ranks, so I googled for a while but didn't find any answer. Here's my question:

  1. Does the path compression runs in (amortized) sublinear time per query?
  2. If it doesn't, is there any counterexamples for that?
  • Проголосовать: нравится
  • +14
  • Проголосовать: не нравится

»
8 лет назад, # |
Rev. 4   Проголосовать: нравится 0 Проголосовать: не нравится

Any Disjoint-set Structure requires amortize Ω(α(m, n)) for each operation, which means it cannot be ran at amortized linear time.

You can see more information about time complexity of disjoint-set structure here

  • »
    »
    8 лет назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    Are you talking about the upper bound? I was asking about the time complexity of path compression in amortized sublinear time (per query). I will fix that part in the article.

    If you are talking about the upper bound, can I ask for the exact page of proof in that paper? I also read that paper shortly, but I couldn't find the part for disjoint set without rank compression.

»
8 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Auto comment: topic has been updated by ko_osaga (previous revision, new revision, compare).

»
8 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится
»
8 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

I can't remember the problem, but once I wrote it without ranks, and I got TLE. When I added ranks to it, it passed. That's why I don't think it's always a good idea.

»
8 лет назад, # |
Rev. 2   Проголосовать: нравится +11 Проголосовать: не нравится

Ignore me, I'm misreading things.

  • »
    »
    8 лет назад, # ^ |
      Проголосовать: нравится +19 Проголосовать: не нравится

    What do you mean by "got rid of path compression". It still seems to be in find, isn't it? And if you have path compression, you are guaranteed to be at most O(log N) on average per query.

    • »
      »
      »
      8 лет назад, # ^ |
        Проголосовать: нравится +19 Проголосовать: не нравится

      Thanks, that just reminds me that I should not post on internet before having a morning cup of coffee.

»
8 лет назад, # |
  Проголосовать: нравится +8 Проголосовать: не нравится

Interesting post. I also never use rank (then I have to implement the whole thing) as it seemed not to be very useful in practice. Instead of it, several years ago, I started joining parents at random, that is:

void set_union(int p, int q){
    p = find(p), q = find(q);
    if(rand() % 2) parent[p] = q;
    else parent[q] = p;
}

After reading your post I tried to search for some complexity proofs for this type of linking and found this gem Disjoint Set Union with Randomized Linking. In article authors review most common types of linking (disclaimer: still haven't read the whole thing):

  • by size/rank/randomly shuffled input
  • without heuristic/by index

Here n is number of items and m number of operations.

So it seems that randomized union is also a viable simplification :) (unless calculating random becomes your bottleneck :D)