A Brief Inquiry into Online Connectivity

Revision en5, by ko_osaga, 2023-10-30 20:32:34

This question striked my head: "How can I solve dynamic $$$k$$$-connectivity efficiently?"

And then I tried to answer it, but I realized that my question was open to a lot of different interpretations.

Two vertices are $$$k$$$-connected if there are $$$k$$$ edge-disjoint paths connecting two vertices. For $$$k = 1$$$, it is the usual definition of connectivity.

Solve?

If I say, "I solved the graph connectivity problem", what can it possibly mean?

First Interpretation ($$$s$$$-$$$t$$$ connectivity). I can respond to the following query efficiently: Given two vertices $$$s, t$$$, determine if there is a path between them. In the case of $$$k = 1$$$, graph search suffices.

What about higher $$$k$$$? You can find $$$k$$$ edge-disjoint path by reducing it into a flow problem. Each edge-disjoint path corresponds to a flow from $$$s$$$-$$$t$$$, so make all edges to capacity one, and find a flow of total capacity $$$k$$$ from $$$s$$$ to $$$t$$$. This algorithm takes $$$O(\min(k, m^{1/2}) (n + m))$$$ time.

Second Interpretation (Graph connectivity). I can't respond to an individual query, but I can respond if all pairs of vertices are connected or not. In the case of $$$k = 1$$$, the answer is true if the graph is connected. In the case of $$$k = 2$$$, the answer is true if the graph is connected and has no bridges.

In higher $$$k$$$, the problem is problem is known as Global Min Cut. How to solve it?

  • The randomized algorithm of Karger-Stein runs in polynomial time and is frequently taught in undergraduate classes because it's very beautiful in its simplicity and analysis.
  • In competitive programming, $$$O(nm)$$$ solution to Global Min Cut ( Stoer-Wagner Algorithm ) is somewhat known. It is even more beautiful in its simplicity. I'm unsure if the same can be said for analysis.
  • Actually, you can solve the Global Min Cut in $$$O(m \log^2 n)$$$ time, a blend of randomized algorithm with query-on-a-tree type data structure problem — which means it's the most beautiful!

Third Interpretation (Connectivity Certificate). It sucks to have only one of them, why not both? In the case $$$k = 1$$$, we can DFS for each connected component and label it, so $$$s$$$-$$$t$$$ connectivity is solved by checking if $$$label[u] = label[v]$$$, and graph connectivity is solved by checking if all labels are same. In the case $$$k = 2$$$, we can compute the biconnected component of the graph (aka remove bridges and DFS) to do the same thing. So we have this theme:

  • In the case $$$k = 1$$$, we have a bunch of isolated connected components.
  • In the case $$$k = 2$$$, we have each $$$2$$$-connected components forming a forest.
  • In the case $$$k = 3$$$, we have each $$$3$$$-connected components forming a cactus! See NRSS21, Theorem 3.1
  • In the general case, suppose that we are dealing with a $$$k$$$-connected graph (graph with min cut $$$k$$$). If we decompose this graph into a $$$(k+1)$$$-connected component, they will form a cactus (if $$$k$$$ even) or a tree (if $$$k$$$ odd). This phenomenon is known as the cactus representation of minimum cuts, and the representation can be computed in $$$O((n+m) \text{poly}(\log n))$$$ time.

Another important structure is the Gomory-Hu Tree. For an undirected graph, there is a tree with the same vertex set and weighted edges, where the $$$s-t$$$ max flow corresponds to the minimum weight in the unique $$$s-t$$$ path in the tree. This also works as a good certificate, since the $$$s - t$$$ path minimum can be computed efficiently with sparse tables or likewise. A standard way to compute the Gomory-Hu tree requires $$$n$$$ iteration of the maximum flow algorithm, which is $$$O(m^{3/2})$$$ assuming a standard algorithm in an unweighted case.

How to maintain such certificate in dynamic queries? Suppose that you have two big connected components, and you repeatedly add and remove edges between those components. If you maintain a dynamic graph under such a label, then you will end up with $$$O(n)$$$ labels changing in each query which is clearly impossible. I assume the certificate as an implicit structure (Disjoint Set Union, Link Cut Tree are good examples) that preserves the connectivity structure without a few changes and can answer some queries (component aggregates, root/label of the component) in an efficient ways. This is kinda ill-defined, but I don't know how to formalize it.

Still more interpretation? You can define connectivity as the minimum number of vertex to remove. For such a definition, there are a lot of interesting structures such as Block-Cut Tree or SPQR Tree, but the margin is too short to contain these. I limit the scope of the article to exact undirected edge connectivity, which is still not enough to make this article exhaustive, but whatever.

Dynamic?

Mostly, if we say a graph is dynamic, then we assume updates that insert and delete the edges. This is known as fully-dynamic, but this may get too hard and sometimes we may resort to special cases such as incremental and decremental. The case of incremental assumes no update that deletes the edges, and for the decremental — no updates that insert the edges. For example, disjoint set union (DSU) solves the connectivity problem in case the updates are incremental.

Sometimes, you don't have to answer all queries immediately, but only before the program terminates. In that case, you can take advantage of the fact that you know the whole set of queries that will be given, and may change the order of computations or so on. This setting is called offline and it is especially prevalent in competitive programming. For example, the connectivity problem can be solved fully-dynamic if we assume offline queries, and this idea is well-known under the name Offline Dynamic Connectivity.

Another interesting special case is where the updates are not cumulative: Given a graph, you add or remove a small set of vertex/edges, respond to the query, and then the update queries are reverted and you get back to the original graph given. For example, you may want to know if there is a $$$s - t$$$ path if edge $$$e$$$ is removed from the graph — you can solve this with biconnected components. This setting is called sensitivity, which is not prevalent in CP, but I know problems that ask this (problem G).

$$$k$$$?

And of course, there is freedom over the selection of $$$k$$$ as well. $$$k$$$ could be either $$$1$$$ (connectivity), $$$2$$$ (biconnectivity), $$$3$$$ (triconnectivity), $$$4$$$ (??), $$$O(1), O(\text{poly}(\log n))$$$, $$$O(n)$$$ .. it could be even very bigger if you assume that edges are weighted and define connectivity as the maximum $$$s - t$$$ flow.

Efficiently?

But there should be no dispute about the efficiency since it's just fast or slow. Is it? Maybe not. Sometimes, you are concerned about the worst-case query time, where you have to answer all queries with small computation. At other times, you are concerned about the amortized query time, where each query may need long computation, but in the end, the sum of spent computation can be bounded.

Worst-case bound can be necessary not only by itself but also when you need to use it as a black-box data structure. Suppose you want to support the undoing of the last query, or even make it persistent. Then you can take some queries that need long computation, undo and redo repeatedly to mess up the analysis.

The efficiency can have different definitions in diverse computing environments such as parallel / distributed, which we won't go into for obvious reasons.

The table

So we have these various measures, let's make some tables! Please write in comments if the results are incorrect or not the fastest. The table is just a collection of googled materials.

All results below assume amortized bounds. I omitted Big-O for brevity.

Static

$$$k$$$ $$$s - t$$$ Global Certificate
$$$1$$$ $$$m$$$ [1] $$$m$$$ [1] $$$m$$$ [1]
$$$2$$$ $$$m$$$ [2] $$$m$$$ [5] $$$m$$$ [5]
$$$3$$$ $$$m$$$ [2] $$$m$$$ [6] $$$m$$$ [6]
$$$4$$$ $$$m$$$ [2] $$$m$$$ [7] $$$m$$$ [7]
$$$O(1)$$$ $$$m$$$ [2] $$$m + n \log n$$$ [9] $$$m + n \log n$$$ [9]
$$$\text{poly} \log n$$$ $$$m \text{ poly}\log n$$$ [2] $$$m \log^2 n$$$ [4] $$$(m + n) \text{ poly} \log n$$$ [9]
$$$O(n)$$$ $$$m^{1 + o(1)}$$$ [3] $$$m \log^2 n$$$ [4] $$$m^{1 + o(1)}$$$ [8]
Weighted $$$m^{1 + o(1)}$$$ [3] $$$m \log^2 n$$$ [4] $$$n^2 \text{ poly} \log n$$$ [8]

Offline Fully-Dynamic

$$$k$$$ $$$s - t$$$ Global Certificate
$$$1$$$ $$$\log n$$$ [10] $$$\log n$$$ [10] $$$\log n$$$ [10]
$$$2$$$ $$$\log n$$$ [11] $$$\log n$$$ [11] $$$\log n$$$ [11]
$$$3$$$ $$$\log n$$$ [11] $$$\log n$$$ [11] $$$\log n$$$ [11]
$$$4$$$ $$$\text{poly} \log n$$$ [12] $$$\sqrt n \text{ poly} \log n$$$ [13] $$$\text{poly} \log n$$$ [12]
$$$O(1)$$$ $$$\text{poly} \log n$$$ [12] $$$\sqrt n \text{ poly} \log n$$$ [13] $$$\text{poly} \log n$$$ [12]
$$$O(\text{poly} \log n)$$$ $$$n \text{ poly}\log n$$$ [15] $$$\sqrt n \text{ poly} \log n$$$ [13] $$$n \text{ poly}\log n$$$ [15]
$$$O(n)$$$ $$$m^{1 + o(1)}$$$ [3] $$$\min(n, m^{15/16}) \text{ poly} \log n$$$ [22] $$$m^{1 + o(1)}$$$ [8]
Weighted $$$m^{1 + o(1)}$$$ [3] $$$m \log^2 n$$$ [4] $$$n^2 \text{ poly} \log n$$$ [8]

Online Incremental

$$$k$$$ $$$s - t$$$ Global Certificate
$$$1$$$ $$$\alpha(n)$$$ [18] $$$1$$$ [20] $$$\alpha(n)$$$ [18]
$$$2$$$ $$$\alpha(n)$$$ [19] $$$\alpha(n)$$$ [19] $$$\alpha(n)$$$ [19]
$$$3$$$ $$$\log n$$$ [9] $$$\log n$$$ [9,16] $$$\log n$$$ [9]
$$$4$$$ $$$\log n$$$ [9] $$$\log n$$$ [9,16] $$$\log n$$$ [9]
$$$O(1)$$$ $$$n^{o(1)}$$$ [21] $$$\log n$$$ [16] $$$n \log n$$$ [15]
$$$O(\text{poly} \log n)$$$ $$$n \text{ poly}\log n$$$ [15] $$$\text{poly} \log n$$$ [16,17] $$$n \text{ poly}\log n$$$ [15]
$$$O(n)$$$ $$$m^{1 + o(1)}$$$ [3] $$$\text{poly} \log n$$$ [17] $$$m^{1 + o(1)}$$$ [8]
Weighted $$$m^{1 + o(1)}$$$ [3] $$$m \log^2 n$$$ [4] $$$n^2 \text{ poly} \log n$$$ [8]

Online Fully-Dynamic

$$$k$$$ $$$s - t$$$ Global Certificate
$$$1$$$ $$$\log^{1+o(1)} n$$$ [23] $$$\log^{1+o(1)} n$$$ [23] $$$\log^{1+o(1)} n$$$ [23]
$$$2$$$ $$$\log^{2+o(1)} n$$$ [24] $$$\log^{2+o(1)} n$$$ [24] $$$\log^{2+o(1)} n$$$ [24]
$$$3$$$ $$$n^{o(1)}$$$ [21] $$$\sqrt n \text{ poly} \log n$$$ [13] $$$n^{2/3}$$$ [15]
$$$4$$$ $$$n^{o(1)}$$$ [21] $$$\sqrt n \text{ poly} \log n$$$ [13] $$$n \alpha(n)$$$ [15]
$$$O(1)$$$ $$$n^{o(1)}$$$ [21] $$$\sqrt n \text{ poly} \log n$$$ [13] $$$n \log n$$$ [15]
$$$O(\text{poly} \log n)$$$ $$$n \text{ poly}\log n$$$ [15] $$$\sqrt n \text{ poly} \log n$$$ [13] $$$n \text{ poly}\log n$$$ [15]
$$$O(n)$$$ $$$m^{1 + o(1)}$$$ [3] $$$\min(n, m^{15/16}) \text{ poly} \log n$$$ [22] $$$m^{1 + o(1)}$$$ [8]
Weighted $$$m^{1 + o(1)}$$$ [3] $$$m \log^2 n$$$ [4] $$$n^2 \text{ poly} \log n$$$ [8]
References

I might write some articles that explain the referenced materials, I think [6] and [14] would be fun.

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en5 English ko_osaga 2023-10-30 20:32:34 545
en4 English ko_osaga 2023-10-30 08:18:31 2
en3 English ko_osaga 2023-10-30 08:18:11 4
en2 English ko_osaga 2023-10-30 08:17:16 44
en1 English ko_osaga 2023-10-30 08:16:03 14538 Initial revision (published)