Offline Dynamic MST [Tutorial]

#	User	Rating
1	tourist	3947
2	jiangly	3734
3	Radewoosh	3646
4	jqdai0815	3620
4	Benq	3620
6	orzdevinwang	3612
7	ecnerwala	3581
8	Geothermal	3569
8	cnnfls_csy	3569
10	ksun48	3479

#	User	Contrib.
1	awoo	162
2	maomao90	160
3	nor	156
4	cry	155
4	adamant	155
4	atcoder_official	155
4	-is-this-fft-	155
8	maroonrk	153
9	SecondThread	147
10	Petr	146

Inspired by an old Adamant blog, I reinvented a variation of the technique for calculating the cost of the minimum spanning tree offline, when edge weight updates are allowed. This technique is not new and already appeared in work of David Eppstein. This was also in Adamant's blog, but I missed it completely :).

So the problem statement is: You're given a undirected, connected, weighted graph with $$$n$$$ nodes and $$$m$$$ edges. There are $$$q$$$ updates of the form: Set the weight of $$$k'th$$$ edge in the input to $$$x$$$. After each update you should report the total weight of a minimum spanning tree. Notice that this is strong enough to simulate edge deletions and insertions, by just giving the edge a weight of infinity when it is "deleted".

This is difficult to do if you were forced to do all updates online. But offline, we can use divide and conquer on time to achieve a decent time complexity. We will aim for a complexity of $$$O( (q+m) \log(q))$$$.

Let's start with the ground work. We will number each update from $$$1$$$ to $$$q$$$. Each edge's weight changes at certain points in time. If we focus on one edge, this splits the time interval $$$[0,q]$$$ into smaller intervals $$$[0,x_1), [x_1,x_2), \dots [x_k,q]$$$, where in each interval the value of the edge stays the same. The total number of intervals created this way is $$$m+q$$$. Let's store all these intervals in structs:

struct intervaledge {
    int l,r; // time interval for when this edge is active.
    int u,v, weight; // edge itself
};

To find the MST cost for each time from $$$1$$$ to $$$q$$$ we will make a recursive function

solve(int l, int r, vector<intervaledge> es)

that will find the MST costs for all times inside $$$[l,r)$$$, recursively calling itself on smaller intervals. To make the height of the recursion tree small, we will recurse on $$$[l, mid)$$$ and $$$[mid,r)$$$, until we arrive at unit length intervals. The height of the recursion tree will be $$$\log(q)$$$.

The smart trick, that will make everything fast, is that we will make sure that we reduce the size of the graph in each recursive call, such that the number of edges passed into the recursive function is always $$$O(\text{# of active intervals})$$$.

We call an interval $$$[x,y)$$$ active if it partially overlaps with the interval $$$[l,r)$$$, but it does not fully enclose the interval $$$[l,r)$$$, with $$$[l,r)$$$ being the interval of the current recursive call. I will refer to these intervals and corresponding edges with either active intervals, or active edges.

From the complexity analysis of a segment tree we know that each interval is only active in $$$O(\log(q))$$$ recursive calls. If we only do work linear in the number of edges that we pass to each recursive function, we would get a complexity of $$$O((q+m) \log(q))$$$ overall. Because we will be using a disjoint set union structure, the actual complexity is slightly worse.

Compressing the graph

Now comes the hard part, compressing the graph, such that we get our nice complexity. In our recursive function we will get two types of intervals: Some intervals that fully overlap, and some that partially overlap (the active intervals).

We will artificially lower the cost of the edges belonging to the active intervals to $$$-\infty$$$ for a moment. Then we run Kruskal's algorithm for finding the MST of our current graph. Kruskal's algorithm would first consider all active edges, and then consider all fully overlapping intervals.

Now it turns out that any fully overlapping edge that belongs to this special MST, has to appear in any MST where the active edges have arbitrary weights. This is because whatever the eventual weights of the active edges will be, their place in the sorted list of edges can only become later. These "certainly good" edges form some connected components. Now we can compress those connected components into single vertices, and relabel all edges. This actually reduces the number of vertices in the new graph to at most $$$\text{# of active edges}+1$$$. This is easy to prove: The special MST we created is one component. All the edges in the MST, besides the active edges are certainly good. So if we remove all the active edges from the MST we only create at most $$$\text{# of active edges}$$$ new components.

Reducing the number of edges

So our graph already has few vertices, now we try to reduce the number of edges. Let's build another MST, now only from the edges that belong to fully overlapping intervals. We can do this again with Kruskal's algorithm. Any edge that is not used in this MST, will certainly not be used in when we add extra edges to the graph (the active edges that we ignored in this MST). So those edges can be removed without affecting the MST's we will obtain further on in the recursion.

Because we ignored some edges, it could be that we calculated a minimum spanning forest, instead of a tree. What we know is that forests with $$$n$$$ nodes can have at most $$$n-1$$$ edges. And because the number of nodes is small, the number of remaining edges is also small.

So in our recursive function we will run these two procedures, to reduce the graph size, and pass this new graph to the left and right smaller intervals. If we would use a naive implementation of Kruskal, we would get an extra log factor, because we need to sort the edges. Luckily, we can only sort the edges once, before the recursion begins, and then the runtime of kruskal becomes $$$O(n + m \alpha(n))$$$.

So the actual time complexity becomes $$$O( (q+m) \log(q) \alpha(n))$$$, where the extra factor $$$\alpha(n)$$$ is the inverse ackermann function, that's due to the use of a DSU.

Code for the DMOJ problem linked

#include "bits/stdc++.h"
using namespace std;
#define all(x) begin(x),end(x)
typedef long long ll;
typedef vector<int> vi;

const int oo = 1e9;

struct DSU{
    vector<int> sz,parent;
    int components;
    void reset(int n) {
        fill(sz.begin(),sz.begin()+n,1);
        iota(parent.begin(),parent.begin()+n,0);
        components=n;
    }
    DSU(int n) : sz(n),parent(n) {
        reset(n);
    }
    void link(int a, int b) {
        components--;
        if(sz[a]<sz[b]) swap(a,b);
        sz[a]+=sz[b];
        parent[b] = a;
    }
    bool unite(int a, int b) {
        int pa = find(a), pb = find(b);
        if(pa!=pb) {
            link(pa,pb);
            return true;
        }
        return false;
    }
    int find(int a) {
        if(a==parent[a]) return a;
        return parent[a] = find(parent[a]);
    }
};

struct dynamicMST {
    struct edge {
        int l,r;
        int u,v,w;
        bool operator<(const edge& o) {
            return w<o.w;
        }
    };
    vector<edge> ives; // edges + time interval that they are active.
    vector<array<int,3>> startes; 
    vi touch; // last time this edge was touched
    int totaln;
    DSU dsu,dsu2;
    vi id;
    dynamicMST(vector<array<int,3>> ES, int n) : startes(ES), touch(ES.size()), totaln(n), dsu(n),dsu2(n), id(n)  {
        // give all edges upfront.
    }
    int q=0;
    void update(int i, int x) {
        // update edge weight of edge i to x
        // if you want to delete the edge, just set it to infinity
        q++;
        auto& [u,v,w] = startes[i];
        ives.push_back({touch[i],q,u,v,w});
        touch[i]=q;
        w = x;
    }
    vector<ll> ans;
    void solve(int l, int r, vector<edge> es, int n, ll cost=0) {
        // remove edges that don't belong to this interval
        es.erase(stable_partition(all(es),[&](const edge& e) {return !(e.r<=l or r<=e.l);}),es.end());
        dsu.reset(n),dsu2.reset(n);

        // compressing connected components
        for(auto& e : es) if(l<e.l or e.r<r) { // active edges
            dsu.unite(e.u,e.v);
        }
        
        for(auto& e : es) if(e.l<=l and r<=e.r) { // fully overlapping edges
            if(dsu.unite(e.u,e.v)) {
                cost+=e.w;
                dsu2.unite(e.u,e.v);
            }
        }

        if(l+1==r) { // base case, we found the MST.
            ans[l]=cost;
            return;
        }

        int cnt=0; // relabel all connected components to 0...cnt-1
        for(int i=0;i<n;++i) if(dsu2.find(i)==i) id[i]=cnt++;
        dsu.reset(cnt);
        for(auto& e : es) { // relabeling and marking useless edges
            e.u = id[dsu2.find(e.u)], e.v = id[dsu2.find(e.v)];
            if(e.l<=l and r<=e.r) {
                if(!dsu.unite(e.u,e.v)) e.l=oo,e.r=-oo; // mark useless edge, will get deleted in next step
            }
        }
        int m = (l+r)/2;
        solve(l,m,es,cnt,cost);
        solve(m,r,es,cnt,cost);
    }
    vector<ll> run() {
        int m = startes.size();
        q++;
        for(int i=0;i<m;++i) {
            auto& [u,v,w] = startes[i];
            ives.push_back({touch[i],q,u,v,w});
        }
        
        sort(all(ives)); // (q+m) log(q+m) time
        ans.resize(q);
        solve(0,q,ives,totaln); // (q+m) log(q) alpha(n) time
        return ans;
    }
};

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);
    int n,m,q; cin >> n >> m >> q;
    vector<array<int,3>> es(m);

    for(auto& [u,v,w] : es) {
        cin >> u >> v >> w;
        --u,--v;
    }

    dynamicMST mst(es,n);

    for(int i=0;i<q;++i) {
        int k,d;
        cin >> k >> d, --k;
        mst.update(k,d);
    }
    auto ans = mst.run();
    for(int i=1;i<=q;++i) { // ans[0] gives the MST cost of the initial MST.
        cout << ans[i] << '\n';
    }
    
}

I hope you enjoyed this blog. If you want to test your own implementation, you can do so in this DMOJ problem, although it has really small constraints.

jeroenodb's blog

Compressing the graph

Reducing the number of edges