learnig's blog

By learnig, history, 20 months ago, , What is the fastest algorithm to compute maximal bipartite matching. What is its worst case run time complexity. How to construct the worst case for such algorithm. Thanks in advance. Comments (13)
 » The fastest is Hopcroft Karp BPM. It runs in O(sqrt(v)*E). You can go through Wikipedia to learn more.
•  » » Do you know In what case it is slow? I think on random graphs it runs pretty fast
•  » » » I bet this guy knows something about it: Burunduk1
•  » » » » In d-regular graph (all degrees are equal to d) there is O(nlogn) algorithm Goel, Kapralov, Khanna. Notice, asymptotic is independent of number of edges. The algorithm: consider graph, on which Kuhn runs dfs, and instead of dfs run "random walk" (choose random edge and go throw it).On random but not regular graphs Kuhn with optimizations is better than Hopcroft.
•  » » » Bipartite Matching problem is equivalent to max flow in a modified graph.In graphs which are dense locally at some places and mostly sparse overall, Push Relabel Algorithm with gap Relabel heuristic runs very very fast. Much faster compared to Dinic's (which is equivalent to Hopcroft Karp).Asymptotic complexity of such algorithm is O(|v|^3) though for general graphs. But on random graphs it runs much more fast. To give you some numbers — my implementation runs within seconds for random graphs with n = 10000 nodes, m = 1000000 edges.
•  » » » » it would be great if you could share your implementation code.
•  » » » » » 20 months ago, # ^ | ← Rev. 2 →   Code#include using namespace std; /* Push Relabel O(n^3) implimentation using FIFO method to chose push vertex. This uses gapRelabel heuristic to fasten the process even further. If only the maxFlow value is required then the algo can be stopped as soon as the gap relabel method is called. However, to get the actual flow values in the edges, we need to let the algo terminate itself. This implimentation assumes zero based vertex indexing. Edges to the graph can be added using the addEdge method only. capacity for residual edges is set to be zero. To get the actual flow values iterate through the edges and check for flow for an edge with cap > 0. This implimentaion is superior over dinic's for graphs where graph is dense locally at some places and mostly sparse. For randomly generated graphs, this implimentation gives results within seconds for n = 10000 nodes, m = 1000000 edges. Code Tested on : SPOJ FASTFLOW @author : triveni */ typedef int fType; struct edge { int from, to; fType cap, flow; edge(int from, int to, fType cap, fType flow = 0) : from(from), to(to), cap(cap), flow(flow) {} }; struct PushRelabel { int N; vector edges; vector > G; vector h, inQ, count; vector excess; queue Q; PushRelabel(int N) : N(N), count(N<<1), G(N), h(N), inQ(N), excess(N) {} void addEdge(int from, int to, int cap) { G[from].push_back(edges.size()); edges.push_back(edge(from, to, cap)); G[to].push_back(edges.size()); edges.push_back(edge(to, from, 0)); } void enQueue(int u) { if(!inQ[u] && excess[u] > 0) Q.push(u), inQ[u] = true; } void Push(int edgeIdx) { edge & e = edges[edgeIdx]; int toPush = min(e.cap - e.flow, excess[e.from]); if(toPush > 0 && h[e.from] > h[e.to]) { e.flow += toPush; excess[e.to] += toPush; excess[e.from] -= toPush; edges[edgeIdx^1].flow -= toPush; enQueue(e.to); } } void Relabel(int u) { count[h[u]] -= 1; h[u] = 2*N-2; for (int i = 0; i < G[u].size(); ++i) { edge & e = edges[G[u][i]]; if(e.cap > e.flow) h[u] = min(h[u], h[e.to]); } count[++h[u]] += 1; } void gapRelabel(int height) { for (int u = 0; u < N; ++u) if(h[u] >= height && h[u] < N) { count[h[u]] -= 1; count[h[u] = N] += 1; enQueue(u); } } void Discharge(int u) { for (int i = 0; excess[u] > 0 && i < G[u].size(); ++i) { Push(G[u][i]); } if(excess[u] > 0) { if(h[u] < N && count[h[u]] < 2) gapRelabel(h[u]); else Relabel(u); } else if(!Q.empty()) { // dequeue Q.pop(); inQ[u] = false; } } fType getFlow(int src, int snk) { h[src] = N; inQ[src] = inQ[snk] = true; count = N - (count[N] = 1); for (int i = 0; i < G[src].size(); ++i) { excess[src] += edges[G[src][i]].cap; Push(G[src][i]); } while (!Q.empty()) { Discharge(Q.front()); } return excess[snk]; } }; int main() { int n, m; scanf("%d %d", &n, &m); PushRelabel df(n); while(m--) { int x, y, c; // cin >> x >> y >> c; // 0- based index scanf("%d%d%d", &x, &y, &c); --x, --y; if(x != y){ df.addEdge(x, y, c); df.addEdge(y, x, c); } } cout << df.getFlow(0, n-1) << "\n"; return 0; } ...
•  » » » » If you use the lowest-label version and use global relabeling, the runtime can be bounded by (see Robert J. Kennedy, Jr. Solving unweighted and weighted bipartite matching problems in theory and practice, chapter 3 (The paper does not treat gap relabeling.)).Even without global relabeling, the lowest label version is usually the fastest one on bipartite matching graphs.
•  » » » » » Thanks for sharing. Not so obvious at the moment though, I will ponder on this :)
 » One can also use Dinic algo and reduce problem to max flow, that will be as fast as Hopcroft Karp on bipartite graphs.
 » You might want to check out this variation of the alternating paths bipartite match algorithm. It is easy to code and very quick for most networks that have particularities (it can do ~1e6 vertices and edges in considerably less than 1s), even quicker than the regular Hopcroft-Karp algorithm (because it doesn't do any bfs beforehand).Note that the key optimization here is in the separation of the two for loops.I have yet to find an example where it does quadratic time (and I think it's very non-trivial). I however suspect that the proof of Edmonds Karp upper bound cannot be applied here, as what the algorithm is doing is not equivalent to a BFS (the alternating paths aren't shortest, per se).
•  » » Your implementation has got an Ω(V2) worst case on a sparse graph (E = Θ(V)). generatorThe input is approximately n, the output is n m followed by m edges. The vertices on either side are zero-based. n = int(input()) k = n//8*2 ed = [] for i in range(k): ed.append((i, i)) for i in range(k-1): ed.append((i+1, i)) for i in range(k//2): ed.append((i, k+2*i)) ed.append((2*k+2*i, k+2*i)) ed.append((2*k+2*i, 3*k+2*i)) ed.append((k+2*i, i)) ed.append((k+2*i, 2*k+2*i)) ed.append((3*k+2*i, 2*k+2*i)) for j in range(k//2): i = k//2-1-j ed.append((k//2+j, k+2*i+1)) ed.append((2*k+2*i+1, k+2*i+1)) ed.append((2*k+2*i+1, 3*k+2*i+1)) ed.append((k+2*i+1, k//2+j)) ed.append((k+2*i+1, 2*k+2*i+1)) ed.append((3*k+2*i+1, 2*k+2*i+1)) print("{} {}".format(n, len(ed))) for e in ed: print("{} {}".format(e, e)) The input is a tree consisting of a long path of length and paths of length 3 hanging from every vertex in the long path. Vertex labels and edge order are chosen in a way such that initially every other edge in the long path is matched and every middle edge in a short path is matched. In a single phase, you'll find a single long augmenting path, alternating left  →  right and right  →  left. (The paths of length 3 break your heuristic.) This results in phases.The lower bound can be extended to Ω(VE) by adding a dense part K2x, x with where you'll scan x2 ≈ E edges every phase.You might be able to fix this by randomizing the vertex and edge ordering.
•  » » » 20 months ago, # ^ | ← Rev. 3 →   Thank you for your counterexample! As weird as this sounds, this implementation is the one that basically every high-school student in our country learns about as being the Hopcroft-Karp algorithm, and nobody here has managed to find a counterexample to break the heuristic (I think mostly because I am not very familiar with how to generate maximal test cases for matching in general).I will try to add this example to our most popular online judge archive. This should be fun.It seems like randomizing the order of vertices drops down the number of iterations to 8 in this case, although it's (very slightly) slower because of the shuffle algorithm. It's good to note though.