nor's blog

By nor, 3 years ago, In English

In this blog post, I will focus on the following problem: 920E - Connected Components?. More generally, we will look at how to do efficiently traverse the complement graph of a given graph in both dfs and bfs orders. In particular, the bfs order also allows us to find a shortest-path tree of the complement graph. This is my first blog, so please let me know of any errors/typos that might have crept in.

Note 1: I will present the algorithms in the order I came up with them, so it might not be the most clean presentation, but I hope that it might give some insight into how to come up with new ideas in graphs. Note that I don't claim to be the first person to come up with these ideas (especially the naive algorithm, which is identical to the solution in the editorial), but I haven't seen any of these implementation tricks being used in any implementation before. Resources pointing to similar ideas/implementations are welcome!

Note 2: All adjacency relations will be in terms of the original graph (which is given to us as an input, and is sparse).

(Not-so-important) note 3: We won't be using lists (or an implementation of lists using arrays, which was made popular by the usual implementation of the adjacency list which is used by a lot of Chinese competitive programmers), but in the final implementation, we will be using an application of DSU which can be used to implement deletions + traversal in a sorted list efficiently.

TL;DR

If you're looking for the implementations, here they are:

  1. Naive DFS in $$$O((n + m) \log n)$$$: 125679159
  2. BFS without sorting adjacency lists in $$$O(n + m)$$$: 125679188
  3. BFS with sorting adjacency lists (using radix sort) in $$$O(n + m)$$$: 125679224
  4. DFS with sorting adjacency lists (using radix sort) in (conjectured) $$$O(n + m)$$$ (guaranteed $$$O((n + m) \log n)$$$): 125679248

Solving the problem using DFS in $$$O((n + m) \log n)$$$ time

Note that in usual dfs, we do something of the following sort while running a DFS on vertex $$$u$$$:

mark u as visited
for v adjacent to u:
  if v is not already visited:
    dfs(v)

Note that the order of the loops doesn't really matter, so we can do something like the following too:

mark u as visited
for v in the set of unvisited vertices:
  if v is adjacent to u:
    dfs(v)

Now if we want to do a DFS on the complement graph, we can merely change the condition to if v is not adjacent to u and we will be done.

How will we go about implementing this algorithm efficiently? Suppose we maintain a set of unvisited vertices, and the adjacency lists are sets instead of vectors/lists. Then we can go through the set of unvisited vertices, and if an unvisited vertex is not in the adjacency set of the current vertex, we can recursively do a DFS on that vertex.

Why is this efficient? Note that in any iteration, we either skip a vertex or we don't. In the case we skip a vertex, the vertex has to be in the set of vertices that are adjacent to the current vertex, so we don't do more than $$$O(m)$$$ skips. In the case we don't skip, we remove a vertex from the unvisited set, so we don't do more than $$$O(n)$$$ iterations of this type either. So the total number of iterations is $$$O(m + n)$$$, and the cost of every iteration is upper bounded by $$$O(\log n)$$$, which gives us the bound.

Implementation: 125679159

Solving the problem using BFS in $$$O(n + m)$$$ time

Let's come to our first linear implementation. I switched over to BFS, since you can still find components using BFS, and it is easier to reason about an iterative algorithm than a recursive algorithm with global variables.

The first change is that we don't use a set, instead we use a vector to store unvisited vertices. Initially all the vertices are unvisited.

Let's see what happens when we process a vertex $$$u$$$ in this algorithm.

Firstly, we note down all the unvisited vertices that are adjacent to the vertex $$$u$$$. Let's call them blocked vertices (for $$$u$$$), and mark them in a global array (we will unmark them when we finish the iteration corresponding to $$$u$$$).

Then we iterate over all the unvisited vertices, and we can push an unvisited vertex into the queue if and only if it is not $$$u$$$ and it is not blocked. Then the only remaining unvisited vertices are the ones that are blocked, so we replace the set of unvisited vertices by the set of blocked vertices. This can be seen to be identical to a usual BFS, but with minor modifications.

Why is this efficient? Suppose the order in which vertices are popped from the queue is $$$u_1, u_2, \ldots, u_k$$$. Then when we finish processing $$$u_i$$$, the unvisited set is $$$V \cap N(u_1) \cap \cdots \cap N(u_i) \subseteq N(u_i)$$$, so the total size of the vectors we process at any point is at most $$$O(m)$$$. The number of iterations is also hence upper bounded by $$$O(n + m)$$$, and we are done. Note that this argument doesn't depend on the connectedness of the graph.

Implementation: 125679188

Solving the problem using BFS in $$$O(n + m)$$$ time, using yet another algorithm

Remember how we implemented DFS? We can do a similar thing for BFS, and we will modify the algorithm in the previous section a bit to somehow match what we did in the DFS case. We can try sorting the adjacency lists. However, that can be worst case $$$O(m \log n)$$$ or something similar. How do we do better than that? Instead of sorting adjacency lists separately, we can sort all edges using radix sort in $$$O(n + m)$$$, and then create a graph from those edges which will automatically have sorted adjacency lists.

Now our invariant will be that the set of unvisited vertices is sorted. So when we process a vertex, instead of marking the blocked vertices in advance, we can use two-pointers to check if an unvisited vertex is in adjacent to $$$u$$$ or not. The rest of the algorithm is pretty much identical to the remaining part of the previous algorithm.

Implementation: 125679224

Solving the problem using DFS in conjectured $$$O(n + m)$$$ time.

This is (in my opinion) the most interesting algorithm among all of the ones I have presented so far. However, I don't have a correct proof of the time complexity. We will use the idea from the previous implementation here for sorting the adjacency list, and all we need to do now is to make the following operations in the naive DFS implementation fast enough:

  1. Erasing an element from a set
  2. Iterating over a set
  3. Finding an element in the adjacency list of the current vertex

Note that the two-pointers idea still works here while iterating over the global unvisited set (which is not implemented using a std::set since erasing would be too slow), if we are guaranteed that the unvisited set is sorted during iteration.

Consider the state of the unvisited set. At any point, it consists of some vertices $$$-1 = a_0 \le a_1 \le a_2 \le \cdots \le a_k \le n = a_{k+1}$$$ (the first and the last elements are just for the sake of convenience in the following definition). Define the operation $$$root$$$ as follows: for any $$$a_i < x \le a_{i+1}$$$, we have $$$root(x) = x$$$. Note that the computation of $$$root$$$ will be done lazily whenever we need to compute the function $$$root$$$. Then if we have such an operation, we can do the needed operations as follows:

  1. Erasing an element from a set: set $$$root[x] = x + 1$$$ for now (when the actual value of $$$root(x)$$$ is needed, we will do it then and there, and update $$$root[x]$$$ and all its dependencies for speeding up further computations). This links together the two ranges: the one ending at $$$x$$$ and the one starting at $$$x + 1$$$.
  2. Iterating over the set: We can do this using a single for loop as follows (by definition of the function root): for (int it = root(0); it < n; it = root(it + 1))
  3. Finding an element in the adjacency list of the current vertex can be done using two pointers, since we are guaranteed by the definition of $$$root$$$ that we iterate over the unvisited vertices in increasing order.

Now how do we lazily compute $$$root$$$ when needed? We shall do so in a DSU-like manner with path compression as follows:

int get_root(int u) {
  if (root[u] == u) return u;
  return root[u] = get_root(root[u]);
}

Why is this efficient? DSU guarantees that the time taken is $$$O((m + n) \log n)$$$. However, running some stress tests suggests that this is in fact much better than that, and that the number of compressions done is in $$$O(n + m)$$$. I tried proving it but found a fatal error in a step, so if anyone can come up with a proof (or a counter-example) for this, please let me know!

Implementation: 125679248

  • Vote: I like it
  • +168
  • Vote: I do not like it

»
3 years ago, # |
  Vote: I like it +44 Vote: I do not like it

NORZ

»
3 years ago, # |
  Vote: I like it +26 Vote: I do not like it

Just gonna pretend that i understand this stuff

»
8 months ago, # |
  Vote: I like it 0 Vote: I do not like it

When this type of formulation work? is this for this particular question or if you have some more problem Please Share. Thanks for explaining

»
8 months ago, # |
  Vote: I like it +23 Vote: I do not like it

I just realized that there is also beautiful a linear-time solution to DFS, working in provable $$$O(n + m)$$$ time, using doubly-linked lists and an idea similar to "dancing links".

The main idea is to store the "global visited list" as a doubly linked list. After you call dfs(child), you get back to the next value by going through the prev pointers in the linked list.

It is important that the "deletion" from the linked list does not re-link the deleted node (let it point to the old prev and next nodes). You can prove intuitively that the "invalid" list you are traversing from this deleted node is at any time a superset of the real "valid" list. Also, because we use the prev pointers to go from an invalid (visited) node to a valid (unvisited) node, then these operations amortize beautifully (just like the principle of a stack).

Implementation: 219699489 (I tried to make it as similar as possible to the original DSU implementation for ease of comparison)

  • »
    »
    8 months ago, # ^ |
      Vote: I like it +3 Vote: I do not like it

    Thanks for the implementation! As a consequence, we can find the DFS tree of the complement graph in linear time, which means we can find 2-vertex-connected and 2-edge-connected components (and hence, the block-cut tree and the bridge tree respectively) of an arbitrary undirected graph in the same time complexity.

»
8 months ago, # |
Rev. 2   Vote: I like it +27 Vote: I do not like it

One of my favorite implementation of the "complement traversal" problem is as follows.

The problem becomes trivial if $$$n \leq \sqrt{m}$$$ because the simple $$$O(n^2)$$$ algorithm wins. Otherwise suppose that $$$0$$$ is the vertex with the minimum degree among the graph $$$G$$$. Hence $$$\mathrm{deg}(0) \leq \frac{2m}{n} = O(\sqrt{m})$$$. Let $$$X = V \setminus \{0\} \setminus N(0)$$$. i.e., those vertices which are not adjacent to the vertex $$$0$$$ in $$$G$$$. $$$X \cup \{0\}$$$ should be connected in $$$\bar{G}$$$. Brute force on $$$N_0$$$ gives an $$$O((\frac{2m}{n})^2) = O(m)$$$ algo.

  • »
    »
    8 months ago, # ^ |
      Vote: I like it +3 Vote: I do not like it

    This algorithm is indeed really cool. I remember reading about it in this comment at some point but seemed to have forgotten it over time, so thanks for the reminder!

»
10 days ago, # |
  Vote: I like it 0 Vote: I do not like it

What does $$$V$$$ and $$$N$$$ mean in $$$V∩N(u_1)∩⋯∩N(u_i)⊆N(u_i)$$$?

  • »
    »
    8 days ago, # ^ |
      Vote: I like it +1 Vote: I do not like it

    $$$V$$$ is the set of all vertices of the graph, and $$$N(u)$$$ denotes the set of neighbours of a vertex $$$u$$$.

»
8 days ago, # |
Rev. 4   Vote: I like it 0 Vote: I do not like it

In the naive DFS solution, won't there be O(n) skips per vertex which in worst case might lead to O(n * n) skips ?

In the following graph
1: 2 3 4 5 ..... n-1
.
.
.
n-2: 2 3 4 ... n-4
n-1: 2 3 4 .... n-3
n: 2 3 4 5 .... n-2

When starting dfs from 1, we will skip all the vertices from 2 to n-1 and make a recursive call to dfs(n). dfs(n) will skip all the vertices from 2 to n — 2 again and will make a recursive call to dfs(n-1) and so on. This way, won't we make a total of O(n * n) skips ?

  • »
    »
    8 days ago, # ^ |
      Vote: I like it +9 Vote: I do not like it

    You can get a better bound here. Note that whenever you skip a vertex $$$v$$$ while in the DFS for vertex $$$u$$$, it is because $$$(u, v)$$$ is not an edge in the complement graph. This is the same as saying that $$$(u, v)$$$ is an edge in the input graph. So the total number of skips can be at most double the number of edges in the input graph, which is $$$m$$$.

    • »
      »
      »
      8 days ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      The graph I mentioned in the edit should have O(n * n) skips, right? Also m is of the order O(n * n) in that test case so we can say there are O(m) skips. But since m is bounded by min(n * (n + 1) / 2, 200000)), we won't be getting such test case for large value of n.

      Am i right or missing something ?

      • »
        »
        »
        »
        8 days ago, # ^ |
          Vote: I like it +9 Vote: I do not like it

        Yes, it will take $$$O(m)$$$ skips regardless of what your graph is. In this case, $$$m = O(n^2)$$$, so it does not contradict what you said.

        • »
          »
          »
          »
          »
          7 days ago, # ^ |
            Vote: I like it 0 Vote: I do not like it

          Thanks!