Two versions of the offline square root decomposition for dynamic minimum spanning tree

#	User	Rating
1	tourist	3880
2	jiangly	3669
3	ecnerwala	3654
4	Benq	3627
5	orzdevinwang	3612
6	Geothermal	3569
6	cnnfls_csy	3569
8	jqdai0815	3532
9	Radewoosh	3522
10	gyh20	3447

#	User	Contrib.
1	awoo	161
2	maomao90	160
3	adamant	156
4	maroonrk	153
5	atcoder_official	148
5	-is-this-fft-	148
5	SecondThread	148
8	Petr	147
9	nor	144
10	TheScrasse	142

Introduction

It's been around 18 months since dynamic minimum spanning tree first crossed my way and yesterday I've finally understood the offline square root decomposition for this problem. I think this solution is somehow explained here, but I also think it can be explained in a better way. This is the goal of this post. I'll explain how to solve two versions of the problem.

Acknowledgments

I'd like to thank thiagocarvp for explaining me the solution to the first version and victoragnez for explaining me the solution to the second version.

First version

Problem statement

First, you're given a connected graph with n vertices and m weighted edges. And then a sequence of q new edges is added to the graph. For each of these q new edges, output the weight of a minimum spanning tree considering only this and the previous edges. For example, take V = {1, 2}, E = {({1, 2}, 5)} and the sequence (({1, 2}, 7), ({1, 2}, 3)), i.e., n = 2, m = 1 and q = 2. The answers are 5 and 3, respectively.

Naive approach

Let's try to answer the queries online. First, build a MST for the initial graph. All we can do with new edges is try to improve the current MST, i.e., the MST can only become lighter, never heavier. It's not hard to see that the required procedure is the following greedy algorithm.

There are only two possibilities for a new edge ({u, v}, w):

An edge between u and v is already present in the MST. In this case, just update its weight taking the minimum between the new and the old weight.
There's no edge between u and v in the current MST. In this case, the new edge will create a cycle. Then just remove the heaviest edge from this cycle.

The first situation can be easily handled in $\text{[math]}$ using maps. The second, however, takes more effort. A simple DFS could find and remove the heaviest edge of the cycle, but it would cost O(n) operations, resulting in a total running time of at least $\text{[math]}$ operations in the worst case. Alternatively, it's possible to augment a link cut tree to do all this work in $\text{[math]}$ per new edge, resulting in a much better $\text{[math]}$ running time.

So the naive approach is either too slow (DFS), or too much code (link cut tree).

Solution

The naive approach might be hard to apply, but it certainly helps us to make an important observation:

Two consecutive MSTs will differ in at most one edge.

In other words, the changes in the solution are very small from one query to the next. And we are going to take advantage of this property, like many popular offline algorithms do. In particular, we'll do something like the square root decomposition of Mo's algorithm. Usually, this property is achieved by sorting the queries in a special way, like Mo's algorithm itself requires. In our case, we have just noticed that this is not necessary. Hence, we'll process the queries in a very straightforward way (and I keep asking myself what took me so long to understand this beautiful algorithm!).

The observation is used as follows. We'll split the queries in $\text{[math]}$ consecutive blocks of $\text{[math]}$ consecutive queries. If we compute the edges that simultaneously belong to all the MSTs of one block, we'll be able to reduce the size of the graph for which we should compute minimum spanning trees. In other words, we're going to run Kruskal's algorithm q times, once per new edge, but it will run for much smaller graphs. Let's see the details.

First, imagine the MST T_i computed right after adding the edge e of the i-th query. Now, if e belongs to T_i, consider $\text{[math]}$ . What does it look like? Sure, T_i' is a forest with two trees (components). And if we condense these two components, we'll get a much smaller graph with precisely two vertices and no edges. Right now, this much smaller graph do not seems to be useful, but let's see what happens if we consider this situation for not only one, but for a block of new edges.

Now, imagine the MST M_i computed right after adding all the edges of the i-th block B_i. The graph $\text{[math]}$ is a minimum spanning forest with at most $\text{[math]}$ components, because the removal of an edge increases the number of components in exactly one and we are considering the removal of at most $\text{[math]}$ edges. Therefore, a condensation would produce a set S_i of at most $\text{[math]}$ vertices. Let's write X to denote the total sum of the weights of the condensed edges (the internal edges of the components).

Consider a MST for the set S_i that uses only the edges added before the i-th block. This MST will have at most $\text{[math]}$ edges. If we use the edges of this MST to initialize and maintain a multiset M of edges, we can insert a new edge in M and run Kruskal's algorithm $\text{[math]}$ times, once per query. Over all blocks, we'll run Kruskal's algorithm q times for graphs with at most $\text{[math]}$ vertices and edges. For the j-th query, we should output X + Y_j, where Y_j is the total sum of the weights of the edges chosen by Kruskal's algorithm.

In a step-by-step description, the algorithm is as follows:

Store the m initial edges in a multiset edges.
Compute large, an array with the edges of a MST for the initial m edges (Kruskal's for m edges).
For each block [l, r]:
1. Create an empty array initial and swap the contents with large.
1. Insert edges e[i] in the multiset edges, l ≤ i ≤ r.
1. Recompute large for the new state of edges (Kruskal's for O(m + q) edges).
1. Use large to find the forest and condense its components (using a DSU, for example) and to find the value of X.
1. Create a multiset M of edges and use initial and the condensed components to fill it with at most $\text{[math]}$ edges.
1. For each edge e[i], l ≤ i ≤ r:
1. 1. Insert e[i] in M.
1. 1. Compute Kruskal's minimum weight Y for the multiset M and output X + Y (Kruskal's for $\text{[math]}$ edges).

We run Kruskal's algorithm $\text{[math]}$ times for a graph with O(m + q) edges and q times for a graph with $\text{[math]}$ edges, so the total running time is around $\text{[math]}$ , if we have a fast DSU implementation.

Here is my AC implementation for this problem:

Code

#include <bits/stdc++.h>
using namespace std;

// problem: http://codeforces.com/gym/101047/problem/I
// tutorial: http://codeforces.com/blog/entry/50554

const int N = 3e4+5;

struct edge {
  int u,v,w,id;
  edge() : id(0) {}
  bool operator<(const edge& o) const { return w < o.w; }
  void read() { scanf("%d%d%d",&u,&v,&w); }
};

// lazy dsu
struct dsu {
  int mark[N], p[N], pass;
  dsu() : pass(1) {}
  void reset() { pass++; }
  int Find(int x) {
    if (mark[x] != pass) {
      mark[x] = pass;
      p[x] = x;
    }
    return p[x] == x ? x : p[x] = Find(p[x]);
  }
  void Union(int x, int y) { p[Find(x)] = Find(y); }
};

int kruskal(const multiset<edge>& edges, vector<edge>* mst = nullptr) {
  static dsu uf;
  uf.reset();
  int ans = 0;
  for (auto& e : edges) if (uf.Find(e.u) != uf.Find(e.v)) {
    uf.Union(e.u,e.v);
    if (mst) mst->push_back(e);
    ans += e.w;
  }
  return ans;
}

int main() {
  int t;
  scanf("%d",&t);
  while (t--) {
    // input
    int n,m,q;
    scanf("%d%d%d",&n,&m,&q);
    multiset<edge> edges;
    for (int i = 1; i <= m; i++) {
      edge e;
      e.read();
      edges.insert(e);
    }
    static edge query[N];
    for (int i = 1; i <= q; i++) {
      query[i].read();
      query[i].id = i;
    }
    // initial large mst
    vector<edge> largemst;
    kruskal(edges,&largemst);
    // answer each block
    for (int l = 1, b = sqrt(q)+1; l <= q; l += b) {
      int r = min(l+b-1,q);
      // current large mst is the initial mst for the queries of this block
      vector<edge> initial;
      largemst.swap(initial);
      // compute next large mst
      for (int i = l; i <= r; i++) edges.insert(query[i]);
      kruskal(edges,&largemst);
      // compute forest
      static dsu uf;
      uf.reset();
      int forest = 0;
      for (auto& e : largemst) if (e.id < l) {
        uf.Union(e.u,e.v);
        forest += e.w;
      }
      // compute initial compressed mst
      multiset<edge> eds;
      for (auto& e : initial) if (uf.Find(e.u) != uf.Find(e.v)) {
        auto tmp = e;
        tmp.u = uf.Find(e.u), tmp.v = uf.Find(e.v);
        eds.insert(tmp);
      }
      // answer each query
      for (int i = l; i <= r; i++) {
        auto tmp = query[i];
        tmp.u = uf.Find(tmp.u), tmp.v = uf.Find(tmp.v);
        eds.insert(tmp);
        printf("%d\n",forest+kruskal(eds));
      }
    }
  }
  return 0;
}

Second version

Problem statement

Again, you're first given a connected graph with n vertices and m weighted edges. This time, the i-th query is a pair (e_i, c_i) where 1 ≤ e_i ≤ m and you should output the weight of a minimum spanning tree after making w(e_i) = c_i permanently (until a new update for e_i is required). For example, take V = {1, 2, 3}, E = (({1, 2}, 5), ({2, 3}, 6), ({3, 1}, 7)) and the updates (1, 8) and (2, 9). The answers are 13 and 15, respectively.

Solution

I'll assume that you have read the solution to the first version, which is simpler, but the core idea is the same: we'll somehow compute the edges that will remain unchanged thourgh a block of $\text{[math]}$ consecutive updates and will belong to all the minimum spanning trees of this block. Again, the remaining graph will have $\text{[math]}$ vertices and edges, so we'll compute the answer for each query in the same time complexity as before.

First, we should reduce the m edges to around O(n) edges using Kruskal. In this phase, we should consider only the edges that won't be updated in this block. Let's refer to these edges as non-updated. In the end, the graph will not be necessarily connected, but the purpose here is only to discard the edges that we know for sure that won't be present in any of the MSTs of this block. Let's call the edges selected by this phase as possibly useful edges.

Now, we should split the possibly useful edges in two disjoint sets: the at least $\text{[math]}$ certainly useful edges that will be present for sure in all the MSTs of this block and the at most $\text{[math]}$ remaining possibly useful edges to be considered by the $\text{[math]}$ Kruskals of this block. For this, we'll just run Kruskal one more time, except that the DSU should be initialized with a special procedure. This procedure is simply to call the union operation for each updated edge (the $\text{[math]}$ edges that will be updated in this block). After this initialization, this DSU will clearly have at least $\text{[math]}$ components, which means that this second execution of Kruskal will have to connect at least $\text{[math]}$ components to make the graph fully connected (this time the graph will end connected for sure!). The edges selected by Kruskal's algorithm are the certainly useful ones, while the discarded are the remaining possibly useful ones.

Along with the specially initialized DSU of the previous Kruskal, you can build a second one using only the certainly useful edges. This second DSU represents the condensation of the forest.

At last, build a set with the $\text{[math]}$ remaining possibly useful edges and the $\text{[math]}$ updated edges. Use and maintain this set to process the queries. For each update, remove the updated edge from this set, update its weight and insert it again, then run Kruskal's algorithm over this sorted set of $\text{[math]}$ edges. You can also maintain a larger set with all the m edges during the updates and across all blocks.

Here is my AC implementation for this problem:

Code

#include <bits/stdc++.h>
using namespace std;

// problem: http://codeforces.com/gym/101246/problem/L
// tutorial: http://codeforces.com/blog/entry/50554

const int N = 4e4+5;

// input
int n,m,u[N],v[N],w[N],e[N],c[N];

// set comparison
struct cmp {bool operator()(int i, int j) { return w[i]!=w[j]?w[i]<w[j]:i<j; }};
typedef set<int,cmp> edgeset;

// lazy flag
struct {
  int mark[N],pass;
  void init() { pass++; }
  bool get(int i) { return mark[i] == pass; }
  void set(int i) { mark[i] = pass; }
} flag;

// lazy dsu
struct dsu {
  int mark[N],pass,p[N];
  void init() { pass++; }
  int Find(int x) { prop(x); return p[x]==x?x:p[x]=Find(p[x]); }
  bool Union(int x, int y) {
    x = Find(x), y = Find(y);
    if (x == y) return false;
    p[x] = y;
    return true;
  }
  void prop(int x) {
    if (mark[x] == pass) return;
    mark[x] = pass;
    p[x] = x;
  }
} d1, d2;

int main() {
#ifdef ONLINE_JUDGE
  freopen("input.txt","r",stdin);
  freopen("output.txt","w",stdout);
#endif
  // input
  scanf("%d%d",&n,&m);
  edgeset large;
  for (int i = 1; i <= m; i++) {
    scanf("%d%d%d",u+i,v+i,w+i);
    large.insert(i);
  }
  int t;
  scanf("%d",&t);
  for(int i = 1; i <= t; i++) scanf("%d%d",e+i,c+i);
  // for each block of sqrt(t) updates
  for (int l = 1, b = sqrt(t); l <= t; l += b) {
    int r = min(l+b-1,t);
    // mark updated edges, initialize first dsu and initialize small set
    flag.init();
    d1.init();
    edgeset sml;
    for (int i = l; i <= r; i++) {
      flag.set(e[i]), d1.Union(u[e[i]],v[e[i]]), sml.insert(e[i]);
    }
    // select O(n) possibly useful from non-updated edges
    static int idx[N]; int cnt = 0;
    d2.init();
    for (int i : large) if (!flag.get(i) && d2.Union(u[i],v[i])) idx[++cnt] = i;
    // select certainly useful from non-updated edges and fill small set
    d2.init();
    int forest = 0;
    for (int i = 1; i <= cnt; i++) {
      if (d1.Union(u[idx[i]],v[idx[i]])) { // certainly useful, O(n) edges
        d2.Union(u[idx[i]],v[idx[i]]);
        forest += w[idx[i]];
      }
      else sml.insert(idx[i]); // possibly useful, O(sqrt(t)) edges
    }
    // answer queries
    for (int i = l; i <= r; i++) {
      // update sets
      large.erase(e[i]);
      sml.erase(e[i]);
      w[e[i]] = c[i];
      large.insert(e[i]);
      sml.insert(e[i]);
      // kruskal
      d1.init();
      int mst = 0;
      for (int i : sml) if (d1.Union(d2.Find(u[i]),d2.Find(v[i]))) mst += w[i];
      printf("%d\n",forest+mst);
    }
  }
  return 0;
}

Conclusion

I hope this post can be useful for others. Constructive criticism and related problems to solve are welcome in the comments!

Rev.	By	When	Δ	Comment
en6	pimenta	2017-02-25 18:28:41	54
en5	pimenta	2017-02-25 06:36:50	6944	Tiny change: 'k._\n\nNow we should' -> 'k._\n\nNow, we should'
en4	pimenta	2017-02-24 22:14:41	2614
en3	pimenta	2017-02-21 07:15:10	28
en2	pimenta	2017-02-21 06:44:02	3	Tiny change: 'ere is my implement' -> 'ere is my AC implement'
en1	pimenta	2017-02-21 06:42:15	11561	Initial revision (published)

Introduction

Acknowledgments

First version

Problem statement

Naive approach

Solution

Second version

Problem statement

Solution

Conclusion

History