My own algorithm — offline incremental strongly connected components in O(m*log(m))

Revision en4, by Radewoosh, 2021-06-09 20:52:33

Hello Codeforces!

As all of you know, there are so many known algorithms named after people who invented them — from the easiest ones, like Dijkstra, to the harder ones, like Berlekamp–Massey algorithm. These algorithms were innovative when they were invented, so of course, it's good that they are named after their inventors.

But, today, I've seen a blog about the solution to the problem "compute LCS of two strings in time $O((n + k) \cdot \log(k))$, where $n$ is the sum of lengths of the strings, and $k$ is the number of pairs of matching positions in them". To be honest, it a bit pissed me off that even this algorithm is named after its creators. Is it ok to name the solution to every possible problem after its author? I won't judge it. Anyway, I want my very own Radecki algorithm, so let me give it a try. If anyone has ever heard about it — it's cool. Let me know, and it'll be just another helpful blog on Codeforces.

Let's imagine the following problem: there is a directed graph on $n$ vertices without any edges and a list of $m$ directed edges which we want to add to this graph one by one. After adding each edge, we want to have full information about the strongly connected components. Know the number of strongly connected components, be able to tell whether two vertices are in the same connected component or even make this information persistent. Critical condition: the list of edges is given in advance.

So, for a few years, I'm able to solve this problem, but people told me that it's rather boring and it's not worth to write a paper about it. Anyway, let me give it a try.

Let's use the divide and conquer technique on the list of the edges. We can also think about it like about divide and conquer over the timeline. So, let's assume that we have some function $rec(a, b, edges)$, where $a$ is the start of the time interval, $b$ is the end (we can assume that both are inclusive, doesn't matter, I'm still pissed a bit) and $edges$ is the list of edges together with the times when they're added. We start the process like $rec(0, m-1, everything)$.

Everytime we should look at the middle of the time interval, so let's compute $s=\lfloor \frac{a+b}{2} \rfloor$. Let's use the standard offline algorithm to calculate strongly connected components in the graph which consists only of the edges from $edges$ which were added before $s$ or exactly at $s$. We want to split recursively somehow, but how to do it to keep good complexity? It turns out that in the moments after $s$ all the SCC that we've just calculated will stay connected. So, we can merge whole components into single vertices and only pass to the right edges connecting vertices from different strongly connected components. Also, the edges that are now connecting different connected components were useless in the past, which means that we can pass to the left the rest of the edges. In this way, each edge will occur once at every level of the recurrence, there are $O(\log(m))$ levels, and in each level the overall complexity is linear, thus we have $O(m \cdot \log(m))$ complexity (if we implement it carefully enough to have no extra logs).

What should we take from the recurrence? The simplest idea: at each call of the $rec$ function, for each edge, push some information to some global data structure. My idea is to for each edge that connects vertices from the same strongly connected component push information "these two vertices are in the same strongly connected component at this time". This gives us something which we can interpret as a graph of undirected weighted edges. It's easy to see that we are only interested in its MST (the graph has $n$ vertices and $O(m \cdot \log m)$ edges with weights from $0$ to $m-1$, we can calculate it's MST in $O(m \cdot \log(m))$ time).

What can we do now with this information? We can answer queries like "What's the moment when vertices $a$ and $b$ start belonging to the same strongly connected component?" Yep, that's the maximum value on the path between these two vertices.

Clear? Clear.

Cool? As f**k.

Easy to implement? So so, easy to copy-paste and use. My implementation (without computing MST, just getting the list of undirected edges) has 72 lines.

New? That's a tough question. I know that it was on the prime new year contest 2018, which took place between 2018 and 2019, which tells that this problem was created in 2018. Even if the intended solution is the same as mine, here comes a twist: link. This is the problem authored by me in July 2017. I know, just a pdf is worth nothing, but it's possible to dig deeper and believe me: the solution implemented here is the same as described.

Lest thoughts: the resulting tree is really similar to Gomory-Hu tree (because of minimums/maximums on the paths).

#### History

Revisions

Rev. Lang. By When Δ Comment
en4 Radewoosh 2021-06-09 20:52:33 2 Tiny change: 'dges) has $72$ lines.\n\' -> 'dges) has 72 lines.\n\'
en3 Radewoosh 2021-06-09 20:48:57 2 Tiny change: '"compute LSC of two st' -> '"compute LCS of two st'