[Tutorial] Blossom Algorithm for General Matching in O(V^3)

Revision en1, by Monogon, 2021-06-29 08:24:06

I have decided to write a tutorial on a topic not even Um_nik knows! (source)

In this tutorial, I will talk about the blossom algorithm, which solves the problem of general matching. In this problem, you are given an undirected graph and you need to select a subset of edges (called matched edges) so that no two edges share a vertex, and the number of matched edges is maximized. More common in competitive programming is bipartite matching, which is the same problem but when the graph is guaranteed to be bipartite. In competitive programming, general matching seems to get a lot of hate for being very challenging to implement. However, the blossom algorithm is quite beautiful, and important in algorithm research.

It will help if you are already familiar with bipartite matching. I will discuss the high level ideas of the algorithm, but my main focus will be on the tricky implementation details. So even though I will try not to leave anything out, reading a different resource like the Wikipedia page may be more useful to you with pretty pictures.

Algorithm Overview

The blossom algorithm works by increasing the number of matched edges by one at a time until it cannot increase it further, then it stops. It's not as simple as just adding a new edge and keeping all the edges from previous iterations. If we did this, we might make a wrong decision and have a sub-optimal matching. Instead of simply searching for one new edge to add to the set, we should search for an augmenting path. This is a path where the edges alternate between being matched and unmatched. A vertex is called exposed if it is not adjacent to any matched edge. Another requirement of augmenting paths is that the endpoints should both be exposed. Also, it must be a simple path, meaning it cannot repeat any vertices.

If we find an augmenting path, we can easily increase the number of edges by $$$1$$$. We simply change all the matched edges to unmatched and all the unmatched edges to matched. An augmenting path has one more unmatched edge than matched edges, so it's correct. We can also easily see that it will still be a valid matching, i.e. no two matched edges will share a vertex. Ok, so augmenting paths are useful for increasing our matching, but are they guaranteed to exist if we don't have a maximum matching? Yes! In fact, consider $$$M_1$$$ to be the set of edges in our current matching and $$$M_2$$$ to be the set of edges in some maximum matching. Now, the symmetric difference $$$M_1\oplus M_2$$$ will be all the edges in exactly one of the two matchings. Based on the matching property, each vertex has degree $$$1$$$ or $$$2$$$. So each connected component must be a cycle or a path. They can't all be cycles, otherwise $$$M_1$$$ and $$$M_2$$$ would have the same number of edges. Therefore since $$$M_2$$$ has more edges, at least one of the components is a path, and we can verify that it's an augmenting path.

Great! Now all we have to do is figure out how to find an augmenting path in a matching, and we're done. Unfortunately, this is more difficult than it sounds. We might think of just using a DFS or BFS from exposed vertices, and ensuring that edges always alternate between matched and unmatched. For example, we could say each vertex $$$v$$$ has two versions $$$(v, 0)$$$ and $$$(v, 1)$$$. We can ensure that matched edges always go from version $$$1$$$ to $$$0$$$ and unmatched from $$$0$$$ to $$$1$$$. So, what's wrong? Won't we correctly find an alternating path of matched and unmatched edges, from one exposed vertex to another? The issue is that we are not ensuring the path is simple. This process could find a path that visits both $$$(v, 0)$$$ and $$$(v, 1)$$$ for some $$$v$$$. Then if you "flip" all the edges, $$$v$$$ would be adjacent to two matched edges, which is a violation. However, this simple DFS/BFS idea does work correctly in bipartite matching, and you might see it's equivalent to the algorithm you already know.

Blossoms

To deal with the problem of repeating vertices, we introduce the concept of blossoms. A blossom is simply a set of $$$2k+1$$$ edges, where $$$k$$$ of them are matched edges. This means there is exactly one "special" vertex $$$v$$$ in the cycle that isn't matched to another vertex in the cycle. Why do we care about blossoms? Because they create a situation where an alternating DFS/BFS could repeat a vertex.

Now that we know what they are, how does the blossom algorithm handle them? It contracts them. This means that we merge all vertices of the blossom into a single vertex, and continue searching for an augmenting path in the new graph. If we eventually find an augmenting path, we will have to lift the path back to our original graph. It's not a trivial result, but it can be shown that a graph has an augmenting path if and only if it has one after contracting a blossom. The concept of contracting blossoms and lifting an augmenting path makes it challenging to implement correctly. Remember, a contracted blossom can become a vertex in another blossom, so you can have blossoms inside blossoms inside blossoms! You should be experiencing headaches around now.

Let's see intuitively how an augmenting path can be lifted, effectively "undoing" a blossom contraction while keeping an augmenting path. Well, there should be an edge entering the blossom and then an edge leaving it. Since the path is alternating in the contracted graph, one of those two edges is matched. This means the "special" vertex $$$v$$$ of the blossom will be involved. Suppose $$$u$$$ is the vertex involved with the unmatched edge leaving the blossom. Let's go around the cycle between $$$u$$$ and $$$v$$$, but there are two ways, which do we take? Of course, it should be alternating, so we have exactly one correct choice.

Summary

In summary, the algorithm works like this. We repeat the following process until we fail to find an augmenting path, then return. We begin a graph search with DFS or BFS from the exposed vertices, ensuring that the paths alternate between matched and unmatched edges. If we see an edge to an unvisited node, we add it to our search forest. Otherwise if it's a visited node, there are three cases.

  1. The edge creates an odd cycle in the search tree. Here, we contract the blossom and continue our search.
  2. The edge connects two different search trees and forms an augmenting path. Here, we keep undoing the blossom contractions, lifting the augmenting path back to our original graph, and flip all the matched and unmatched edges.
  3. The edge creates neither case 1 nor case 2. Here, we do nothing and continue our search.

Implementation in $$$O(V^3)$$$

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en5 English Monogon 2021-06-29 22:33:29 21 Tiny change: 'gorithm) (it has pretty pi' -> 'gorithm) (from which I stole some pretty pi'
en4 English Monogon 2021-06-29 22:32:57 380
en3 English Monogon 2021-06-29 22:13:57 6577 Tiny change: 'rmation:\n- `z`: t' -> 'rmation:\n\n- `z`: t' (published)
en2 English Monogon 2021-06-29 20:51:17 17159 Tiny change: 'h forest. If `d[u] = 0' -> 'h forest. We will assign `d[u] = 0'
en1 English Monogon 2021-06-29 08:24:06 6823 Initial revision (saved to drafts)