Hi everyone!

if you read about Dinic algorithm on CP-Algorithms, especially the section about how it works on unit networks, you'll notice that its supposed runtime for bipartite matching is $$$O(E \sqrt V)$$$, where $$$E$$$ is the total number of edges, and $$$V$$$ is the total number of vertices of a bipartite graph. Let's try it out on the following problem:

**Library Checker — matching on Bipartite Graph**. Given a bipartite graph with $$$V, E \leq 2 \cdot 10^5$$$, find the maximum matching.

The implementation of the Dinitz algorithm that I typically use looks as follows:

**code**

Now, if we make a submission with this implementation, we get... Time Limit Exceeded?! Okay, that's weird. Apparently, roughly the fastest practical algorithm for maximum bipartite matching is so-called Hopcroft-Karp algorithm. Maybe it has a better complexity? Nope, Wikipedia page says it's $$$O(E \sqrt V)$$$. Maybe it's somehow fundamentally different from Dinitz? No, not really — Wikipedia page explicitly states that it is essentially a special case of Dinitz algorithm. So... What gives such bad performance?

Of course, there can simply be a bug in my Dinitz algorithm implementation, but after some checks, it seems that it's not the case. On particular failing test-case, we can see that the algorithm really just executes around $$$200$$$ iterations of traversing the whole flow network of roughly $$$8 \cdot 10^5$$$ edges. So, it seems that the core reason is just the enormous constant-time overhead.

So... If Hopcroft-Karp is just the special case of Dinitz algorithm, applied to bipartite graphs, how do we make it 50x faster?

#### Blocking Kuhn algorithm

**UPD**: It seems that, if applied as is (without randomization), this heuristics can be broken by the test case from this comment by dacin21. If you want to use it, please also include randomization of vertices/edge, as in this submission.

It may seem a bit weird, but let's first talk about something different. One standard and pretty well-known algorithm for bipartite matching is Kuhn's algorithm. It's known to run $$$O(VE)$$$, which wouldn't work as is for this problem, but there are some optimizations to it that will actually allow to get AC on this problem. Yes, you heard it right, with proper heuristics, you can get AC on this problem with presumable $$$O(VE)$$$ runtime, while proven $$$O(E \sqrt V)$$$ runtime algorithm fails. So... What is standard Kuhn's algorithm? It looks like this:

**code**

For each vertex in the right part, we keep an array `match`

, so that `match[u]`

is $$$-1$$$, if it's not matched with anything, and `match[u] = v`

if it is matched with the vertex $$$v$$$ from the left part. Then, for each vertex $$$v$$$ from the left part, we try to match it with some vertex $$$u$$$ from the right part. If $$$u$$$ is not matched — that's great, we can just match it with $$$v$$$. Otherwise, we try to recursively match the vertex `match[u]`

with something different. if we succeed, the vertex $$$u$$$ is free now, and we can match $$$v$$$ with it. Otherwise, we go to the next $$$u$$$ in the adjacency list of $$$v$$$.

Of course, using this implementation, we get Time Limit Exceeded. But! We can change just a few lines to get Accepted instead. Here:

```
for(int i = 0; i < n; i++) {
visited.assign(n, 0);
kuhn(i);
}
```

Written as it is now, the algorithm is essentially equivalent to Ford-Fulkerson algorithm, as we find augmenting paths one by one and push the flow alongside the path. After that, we fully reset `visited`

tags and run everything from scratch. But the core idea of Dinitz algorithm is to push as much flow as possible *before* resetting `visited`

tags and trying pushing it again. A flow found this way is called "blocking flow", as it blocks as from finding further augmenting paths without updating the residual network. So... let's do just that. Try to match as much vertices as possible *before* resetting `visited`

:

```
vector<bool> paired(n);
for(bool repeat = true; repeat;) {
repeat = false;
visited.assign(n, 0);
for(int i = 0; i < n; i++) {
if(!paired[i]) {
paired[i] = kuhn(i);
repeat |= paired[i];
}
}
}
```

And doing just that is enough to get Accepted on the problem using just 1583ms out of 5s! Not bad for a $$$O(VE)$$$ algorithm, huh? This heuristic was already described 8 years ago in this Russian blog by Burunduk1, but I haven't seen its highlight in English yet.

Anyway, adding a bit more optimizations and heuristics, you can bring it down to 986ms, while still using Kuhn's algorithm.

#### Dinitz on bipartite graphs

But what if we still want something *faster*? After all, we still didn't use the Dinitz's algorithm most important concept. Layered networks. The implementation above finds blocking flow in *some* network, which is defined by depth-first search. But to provably constrain the number of iterations of finding blocking flow, we need to make sure that it is a specific network. Namely, the layered network defined by shortest paths from source to each vertex. Without constructing the flow network explicitly, we still can do it like this:

vector<int> dist;
bool bfs() {
dist.assign(n, n);
int que[n];
int st = 0, fi = 0;
for(int v = 0; v < n; v++) {
if(!paired[v]) {
dist[v] = 0;
que[fi++] = v;
}
}
bool rep = false;
while(st < fi) {
int v = que[st++];
for(auto e: g[v]) {
int u = match[e];
rep |= u == -1;
if(u != -1 && dist[v] + 1 < dist[u]) {
dist[u] = dist[v] + 1;
que[fi++] = u;
}
}
}
return rep;
}

In this implementation, rather than initiating the bfs queue with source, we initiate it with all vertices that are directly reachable from it (that is, vertices of the left part that are not paired yet). After that, we find the shortest distance from such vertices to each vertex in the left part, and we can use this information to implicitly only traverse the layered network:

```
vector<size_t> ptr;
bool kuhn(int v) {
for(size_t &i = ptr[v]; i < size(g[v]); i++) {
int &u = match[g[v][i]];
if(u == -1 || (dist[u] == dist[v] + 1 && kuhn(u))) {
u = v;
paired[v] = true;
return true;
}
}
return false;
}
```

This time, similar to Dinitz algorithm, we maintain a pointer to the last checked edge in each vertex. Then, the main loop looks like

```
while(bfs()) {
ptr.assign(n, 0);
for(int v = 0; v < n; v++) {
if(!paired[v]) {
kuhn(v);
}
}
}
```

Implemented this way, it gets Accepted in just 160ms, which is a huge improvement over blocking Kuhn's algorithm which ran in 960ms at best, and certainly over Dinitz algorithm in explicitly constructed flow network, which couldn't even pass under 5s... Note: This is my interpretation of how Dinitz on bipartite graphs should look like if we store the network implicitly. I'm not entirely sure if it coincides with Hopcroft-Karp algorithm, because I don't know it.

Man, you really want that top 1 contribution now …

Dinitz's algorithm on bipartite graphs is basically Hopcroft Karp, as this reference claims. Did you try out standard tricks like the one here?

I did not try that specific implementation before writing the article, but I did now. It indeed gets AC (see submission), but takes 3.7s out of 5s. So, it's still far less practical for this task than blocking Kuhn's algorithm or bipartite-specific implementation described in the blog.

I know that magnus.hegdahl's implementation of Dinitz gets Accepted on that problem too, and runs in 1785ms. This is, of course, a huge improvement compared to other implementations, including the one you linked, but it still falls behind optimized Kuhn's algorithms described in the blog...

Of course, bipartite specific implementations will probably (and in this case, do indeed) beat bipartite agnostic implementations. Just asked in case you didn't try it out.

Also, one of my earlier submissions here seems to be quite similar to the one in the blog, but I am too lazy to verify if it is the same.

Come on Barbie, let's go party

Btw, maybe unrelated, but I see that you mentioned the "Randomized Kuhn" algorithm and dacin21's reply regarding an old question. I wanted to note that randomizing edges also doesn't work on certain cases where the solution is very tight (think about dense bipartite graphs that admit a single perfect matching).

How to generate such graph:

On these graphs, alternating paths are so long and specific that "Randomized Kuhn" algorithm can only do $$$O(1)$$$ augmentations per iteration, making it quadratic (try it with a graph with $$$n = 10^4$$$ and $$$m = 2 \cdot 10^5$$$). Also, the only perfect matching is $$$L(i) \rightarrow R(i)$$$ because of the "acyclicity" of such graphs.

Thanks for pointing that out! But it seems that this implementation is not affected by it? At least I didn't notice any significant slowdown when checking manually. I might've generated the case incorrectly though...

I generated a testcase using the command

`./gen 10000 200000 1`

and the below generator and run the solution, and it seems to run for 733 iterations. Locally on my computer, it runs in 2 seconds.Generator codeHmm, I see... For me, it locally runs for 741 iterations, and takes 0.4s to do so. I initially wanted to pull request to add such test case to Library Checker, but there is already cases such as "many_paths_00", on which it takes 471 iterations and 1s locally.

Trying some other parameters I could increase the number of iterations to ~1200, but it still doesn't seem to increase the runtime significantly...

Indeed, it's pretty hard to obtain a high enough constant factor $$$c$$$ for which the randomized solution works in $$$> cn$$$ iterations.

If you think about the first two iterations, the randomized Kuhn computes a blocking flow for length $$$2$$$, if I'm not mistaken (i.e., after $$$2$$$ iterations all augmenting paths must have length at least $$$3$$$).

If you then recall the proof for Dinic/Hopcroft Karp, it states that after the $$$i$$$-th iteration has a matching cardinality of least $$$\lceil ans \frac{i}{i+1} \rceil$$$. Put $$$i=2$$$ and you get that two iterations of Randomized Kuhn obtains at least $$$\lceil ans \frac{2}{3} \rceil$$$ matching edges.

This essentially means that the constant $$$c$$$ cannot ever be more than $$$\frac{1}{3}$$$, this assuming the very pessimisting scenario that each consequent iteration finds only one extra matching edge.

Nice find! I was always wondering if there was a case against the randomized version.

It's worth noting that a case like this also completely breaks various heuristic solutions for maximum matching in general graphs that "random walk" by flipping adjacent edges.

adamant your dream came true

SpoilerThis is exactly Hopcroft-Karp. However, you can make one small speedup improvement — in dfs part instead of

`ptr`

for edges you can use classic`used`

for vertices. Because it can be easily shown that in matching case you can't have two augmented paths passing through the same vertex for the same layered network.