Introduction
It's been around 18 months since dynamic minimum spanning tree first crossed my way and yesterday I've finally understood the offline square root decomposition for this problem. I think this solution is somehow explained here, but I also think it can be explained in a better way. This is the goal of this post. I'll explain how to solve two versions of the problem.
Acknowledgments
I'd like to thank thiagocarvp for explaining me the solution to the first version and victoragnez for explaining me the solution to the second version.
First version
Problem statement
First, you're given a connected graph with n vertices and m weighted edges. And then a sequence of q new edges is added to the graph. For each of these q new edges, output the weight of a minimum spanning tree considering only this and the previous edges. For example, take V = {1, 2}, E = {({1, 2}, 5)} and the sequence (({1, 2}, 7), ({1, 2}, 3)), i.e., n = 2, m = 1 and q = 2. The answers are 5 and 3, respectively.
Naive approach
Let's try to answer the queries online. First, build a MST for the initial graph. All we can do with new edges is try to improve the current MST, i.e., the MST can only become lighter, never heavier. It's not hard to see that the required procedure is the following greedy algorithm.
There are only two possibilities for a new edge ({u, v}, w):
 An edge between u and v is already present in the MST. In this case, just update its weight taking the minimum between the new and the old weight.
 There's no edge between u and v in the current MST. In this case, the new edge will create a cycle. Then just remove the heaviest edge from this cycle.
The first situation can be easily handled in using maps. The second, however, takes more effort. A simple DFS could find and remove the heaviest edge of the cycle, but it would cost O(n) operations, resulting in a total running time of at least operations in the worst case. Alternatively, it's possible to augment a link cut tree to do all this work in per new edge, resulting in a much better running time.
So the naive approach is either too slow (DFS), or too much code (link cut tree).
Solution
The naive approach might be hard to apply, but it certainly helps us to make an important observation:
Two consecutive MSTs will differ in at most one edge.
In other words, the changes in the solution are very small from one query to the next. And we are going to take advantage of this property, like many popular offline algorithms do. In particular, we'll do something like the square root decomposition of Mo's algorithm. Usually, this property is achieved by sorting the queries in a special way, like Mo's algorithm itself requires. In our case, we have just noticed that this is not necessary. Hence, we'll process the queries in a very straightforward way (and I keep asking myself what took me so long to understand this beautiful algorithm!).
The observation is used as follows. We'll split the queries in consecutive blocks of consecutive queries. If we compute the edges that simultaneously belong to all the MSTs of one block, we'll be able to reduce the size of the graph for which we should compute minimum spanning trees. In other words, we're going to run Kruskal's algorithm q times, once per new edge, but it will run for much smaller graphs. Let's see the details.
First, imagine the MST T_{i} computed right after adding the edge e of the ith query. Now, if e belongs to T_{i}, consider . What does it look like? Sure, T_{i}' is a forest with two trees (components). And if we condense these two components, we'll get a much smaller graph with precisely two vertices and no edges. Right now, this much smaller graph do not seems to be useful, but let's see what happens if we consider this situation for not only one, but for a block of new edges.
Now, imagine the MST M_{i} computed right after adding all the edges of the ith block B_{i}. The graph is a minimum spanning forest with at most components, because the removal of an edge increases the number of components in exactly one and we are considering the removal of at most edges. Therefore, a condensation would produce a set S_{i} of at most vertices. Let's write X to denote the total sum of the weights of the condensed edges (the internal edges of the components).
Consider a MST for the set S_{i} that uses only the edges added before the ith block. This MST will have at most edges. If we use the edges of this MST to initialize and maintain a multiset M of edges, we can insert a new edge in M and run Kruskal's algorithm times, once per query. Over all blocks, we'll run Kruskal's algorithm q times for graphs with at most vertices and edges. For the jth query, we should output X + Y_{j}, where Y_{j} is the total sum of the weights of the edges chosen by Kruskal's algorithm.
In a stepbystep description, the algorithm is as follows:
 Store the m initial edges in a multiset
edges
.  Compute
large
, an array with the edges of a MST for the initial m edges (Kruskal's for m edges).  For each block [l, r]:

 Create an empty array
initial
and swap the contents withlarge
.
 Create an empty array

 Insert edges
e[i]
in the multisetedges
, l ≤ i ≤ r.
 Insert edges

 Recompute
large
for the new state ofedges
(Kruskal's for O(m + q) edges).
 Recompute

 Use
large
to find the forest and condense its components (using a DSU, for example) and to find the value of X.
 Use

 Create a multiset
M
of edges and useinitial
and the condensed components to fill it with at most edges.
 Create a multiset

 For each edge
e[i]
, l ≤ i ≤ r:
 For each edge


 Insert
e[i]
inM
.
 Insert



 Compute Kruskal's minimum weight Y for the multiset
M
and output X + Y (Kruskal's for edges).
 Compute Kruskal's minimum weight Y for the multiset

We run Kruskal's algorithm times for a graph with O(m + q) edges and q times for a graph with edges, so the total running time is around , if we have a fast DSU implementation.
Here is my AC implementation for this problem:
Second version
Problem statement
Again, you're first given a connected graph with n vertices and m weighted edges. This time, the ith query is a pair (e_{i}, c_{i}) where 1 ≤ e_{i} ≤ m and you should output the weight of a minimum spanning tree after making w(e_{i}) = c_{i} permanently (until a new update for e_{i} is required). For example, take V = {1, 2, 3}, E = (({1, 2}, 5), ({2, 3}, 6), ({3, 1}, 7)) and the updates (1, 8) and (2, 9). The answers are 13 and 15, respectively.
Solution
I'll assume that you have read the solution to the first version, which is simpler, but the core idea is the same: we'll somehow compute the edges that will remain unchanged thourgh a block of consecutive updates and will belong to all the minimum spanning trees of this block. Again, the remaining graph will have vertices and edges, so we'll compute the answer for each query in the same time complexity as before.
First, we should reduce the m edges to around O(n) edges using Kruskal. In this phase, we should consider only the edges that won't be updated in this block. Let's refer to these edges as nonupdated. In the end, the graph will not be necessarily connected, but the purpose here is only to discard the edges that we know for sure that won't be present in any of the MSTs of this block. Let's call the edges selected by this phase as possibly useful edges.
Now, we should split the possibly useful edges in two disjoint sets: the at least certainly useful edges that will be present for sure in all the MSTs of this block and the at most remaining possibly useful edges to be considered by the Kruskals of this block. For this, we'll just run Kruskal one more time, except that the DSU should be initialized with a special procedure. This procedure is simply to call the union operation for each updated edge (the edges that will be updated in this block). After this initialization, this DSU will clearly have at least components, which means that this second execution of Kruskal will have to connect at least components to make the graph fully connected (this time the graph will end connected for sure!). The edges selected by Kruskal's algorithm are the certainly useful ones, while the discarded are the remaining possibly useful ones.
Along with the specially initialized DSU of the previous Kruskal, you can build a second one using only the certainly useful edges. This second DSU represents the condensation of the forest.
At last, build a set with the remaining possibly useful edges and the updated edges. Use and maintain this set to process the queries. For each update, remove the updated edge from this set, update its weight and insert it again, then run Kruskal's algorithm over this sorted set of edges. You can also maintain a larger set with all the m edges during the updates and across all blocks.
Here is my AC implementation for this problem:
Conclusion
I hope this post can be useful for others. Constructive criticism and related problems to solve are welcome in the comments!