#	User	Rating
1	tourist	3880
2	jiangly	3669
3	ecnerwala	3654
4	Benq	3627
5	orzdevinwang	3612
6	Geothermal	3569
6	cnnfls_csy	3569
8	jqdai0815	3532
9	Radewoosh	3522
10	gyh20	3447

#	User	Contrib.
1	awoo	161
2	maomao90	160
3	adamant	156
4	maroonrk	154
5	-is-this-fft-	148
5	SecondThread	148
5	atcoder_official	148
8	Petr	147
9	TheScrasse	145
10	nor	144

Motivation

I remember struggling to understand the Tarjan algorithm for Strongly Connected Components. The algorithm feels intuitive to me now and I no longer feel this struggle, but what I do still remember is feeling the lack of some good online resource to guide me through it. This is my attempt to write the article I wish I had read.

Directed DFS tree

Before we get to the problem, we have to first understand the most important concept: directed DFS trees. Understanding them will allow for the Tarjan algorithm to feel like a natural extension (and this article is actually about them, not about Tarjan).

A directed DFS tree is an explicit representation of a DFS traversal of a directed graph.

When a node is visited for the first time, it is added to the tree, either as a root if it was visited in the beginning of a DFS call, or as the child of the node it got visited from.

The edges that were used to visit a node for the first time will be considered as part of the tree and be called tree-edges.

Every other edge of the graph will be described in relation to this tree:

a forward-edge is an edge that connects a node to a descendant.
a back-edge is an edge that connects a node to an ancestor.
a cross-edge is any other edge.

Here is an example of the construction of a tree, with every kind of edge:

(Technically we are building a forest, since we can have multiple roots, but eh :P)

When we start a DFS call on a node, every node added to the tree before the call returns will be a descendant of that node.

A DFS call only finishes after the recursive call on all the children finishes. So every node on the subtree will be visited. Since we only call the DFS on unvisited nodes, we won't call again on any node on the subtree, so no new descendants will be added later.

Strongly connected components

A strongly connected component (SCC) of a graph is a maximal subset of nodes where every node reaches each other.

Every directed graph can be partitioned into SCCs.

Some properties related to DFS tree:

1. If a node and a descendant belong to the same SCC, all the nodes on the path between them also belong to the same SCC.

Explanation

2. Every SCC induces a connected subgraph of the DFS tree.

Explanation

This introduces the idea of a representative of a SCC. A representative of a SCC is the node of the SCC with lowest depth on the DFS tree; this node will be a common ancestor of all the nodes of the SCC.

Algorithm sketch

We have enough to sketch an algorithm to find the SCCs of a graph.

The idea is to do a DFS traversal of the graph keeping in mind the tree structure, and adding SCCs as we find their representatives.

We know a node isn't a representative if some node on its subtree reaches a node above it. So based on the edges we find, we send some signal indicating "the representative of this node is above you". If a node doesn't receive this signal, it is a representative.

When we find a represetantive, we will identify the nodes of its SCC and claim them as part of it. We know that every other SCC representative on its subtree was already found and claimed its own nodes. So we will claim every unclaimed node on its subtree.

Stack simplification

We could use a second function to claim these nodes, like so:

Code

def claim(x, representative):
    if claimed[x]:
        return
    claimed[x] = True
    scc[representative].add(x)
    
    for y in tree.children[x]:
        claim(y, representative)
        
def find_sccs(x):

    # to be done, we call find_sccs recursively on its neighbors, mark nodes as visited and collect signals
    
    if not signal[x]:
        claim(x, x)
        
for x in graph.nodes:
    if not visited[x]:
        find_sccs(x)

To avoid referring explicitly to the tree, we can use a stack to keep track of the unclaimed nodes still on the subtree:

Code

def find_sccs(x):
    
    STACK.push(x)
    
    # to be done, we call find_sccs recursively on its neighbors, mark nodes as visited and collect signals
    
    if not signal[x]:
        popped_value = None
        while x != popped_value:
            popped_value = STACK.top()
            STACK.pop()
            scc[x].add(popped_value)
            
for x in graph.nodes:
    if not visited[x]:
        find_sccs(x)

Algorithm, first version

For the first version of the algorithm, we will explicitly build the tree.

The signal we will use will be the depth of the representative.

For every edge we find, we can derive some information on the depth of the representative:

forward-edges can be ignored, since every node reachable by a forward-edge can be reached by a sequence of tree-edges.
tree-edges will just be used to propagate the signal. If we know a child of a node is not a representantive, we know they will be on the same SCC, and the minimum depth of a node reachable by the child can be used to update the node's signal.
back-edges: If we have a back-edge from node $$$x$$$ to $$$y$$$, then we know that $$$x$$$ and $$$y$$$ belong to the same SCC, and that the depth of the representative of their SCC has to be at most the depth of $$$y$$$.
cross-edges: If we have a cross-edge connecting node $$$x$$$ to node $$$y$$$, two things can happen:

Node $$$y$$$ was already claimed. If this is the case, we know that the representative of node $$$y$$$ was already found, which means node $$$x$$$ and node $$$y$$$ don't belong to the same SCC.
Node $$$y$$$ is unclaimed. This means that no representative for $$$y$$$ was found, and the recursive call on every ancestor not in common with $$$x$$$ finished, continued on their lowest common ancestor, and after a sequence of recursive calls, reached node $$$x$$$. What is the conclusion? Since the representative of $$$y$$$ is at least as low as the lowest common ancestor, node $$$y$$$ is in the same SCC as the lowest common ancestor; the lowest common ancestor reaches $$$x$$$; $$$x$$$ reaches $$$y$$$. They all belong to the same SCC. We have to signal that the depth of the representative of $$$x$$$ is at most the depth of the lowest common ancestor.

Code

def find_sccs(x):
    
    visited[x] = VISITING
    STACK.push(x)
    representative_depth[x] = depth[x]

    for y in graph.adjacency_list[x]:
        if visited[y] == UNVISITED: # this is a tree-edge
            tree.online_add_child(x, y)
            depth[y] = depth[x] + 1
            find_sccs(y)
            if not claimed[y]:
                representative_depth[x] = min(representative_depth[x], representative_depth[y])
        elif visited[y] == VISITING: # this is a back-edge
            assert not claimed[y]
            representative_depth[x] = min(representative_depth[x], depth[y])
        else: # this is a cross-edge or a forward-edge (for a forward-edge, tree.lca(x, y) == x)
            if not claimed[y]: 
                representative_depth[x] = min(representative_depth[x], depth[tree.lca(x, y)])

    if representative_depth[x] == depth[x]:
        popped_value = None
        while popped_value != x:
            popped_value = STACK.top()
            STACK.pop()
            scc[x].add(popped_value)
            claimed[popped_value] = True
            
    visited[x] = VISITED

for x in graph.nodes:
    if visited[x] == UNVISITED:
        depth[x] = 0
        tree.add_root(x)
        find_sccs(x)

DFS order

We must now introduce the concept of DFS order.

We will call the DFS order of a node the order in which it got visited by the DFS.

We can make some observations relating to the corresponding DFS tree:

1. Ancestors have a lower DFS order than descendants.

Explanation

(Note: this directly implies that back-edges connect nodes to a node with a lower DFS order and that forward-edges connect nodes to a node with greater DFS order.)

2. The DFS order of all the nodes of a subtree form a contiguous range.

Explanation

3. Cross-edges connect nodes to nodes with a lower DFS order.

Explanation

Tarjan, at last

We can now use the concept of DFS order to build the final version of the algorithm.

The signal we will use will be the DFS order of the representative.

The reason why we previously had to explicitly build the tree was to calculate the depth of lowest common ancestors.

But we can do better if our signal is the DFS order: on cross-edges connecting nodes $$$x$$$ to $$$y$$$, all the ancestors of $$$x$$$ that are not ancestors of $$$y$$$ were visited after $$$y$$$, except all their common ancestors, which are the only possibilities for their representatives.

So, in short:

tree-edges and forward-edges are used similarly.
back-edges connecting node $$$x$$$ to node $$$y$$$, we know that the representative wasn't inserted after $$$y$$$.
cross-edges connecting node $$$x$$$ to node $$$y$$$ where $$$y$$$ is unclaimed, we know that the representative was inserted before $$$y$$$.

At last, we have the code:

Code

def find_sccs(x):
    
    visited[x] = VISITING
    STACK.push(x)
    dfs_order[x] = timer
    timer += 1
    representative_dfs_order[x] = dfs_order[x]

    for y in graph.adjacency_list[x]:
        if visited[y] == UNVISITED:
            find_sccs(y)
        if not claimed[y]:
            representative_dfs_order[x] = min(representative_dfs_order[x], representative_dfs_order[y])

    if representative_dfs_order[x] == dfs_order[x]:
        popped_value = None
        while popped_value != x:
            popped_value = STACK.top()
            STACK.pop()
            scc[x].add(popped_value)
            claimed[popped_value] = True
            
    visited[x] = VISITED

for x in graph.nodes:
    if visited[x] == UNVISITED:
        find_sccs(x)

de_sousa's blog

Motivation

Directed DFS tree

Strongly connected components

Algorithm sketch

Stack simplification

Algorithm, first version

DFS order

Tarjan, at last