### pikel_rik's blog

By pikel_rik, history, 5 months ago,

We all know the time complexity of the Union Find data structure, better know as dsu, over a series of $m$ union and find operations is $O((m + n)α(n))$, but I find it hard to convince myself of this fact. I feel like the time complexity is actually linear ($O(m + n)$). This isn't a post meant to prove the time complexity of the Union Find structure to be linear. I'm simply asking where I'm going wrong with the way I'm thinking about this data structure.

This is how I'm thinking about this data structure. Consider the union operation as simply adding an edge between two vertices. If these two vertices are already in the same connected component/set, then we don't bother adding the edge, otherwise we add the edge. The way most of us implement our Union Find structures is such that the edge we add is directed from one connected component to the other (usually from the one with lower rank to the one with higher rank).

Now coming to the find operation which finds the connected component to which a vertex belongs. This essentially traverses some number of directed edges to give the "leader" of the connected component it lies in. But when a directed edge is traversed, it isn't traversed ever again (except when the destination vertex of the edge is a leader). Since we do not add edges between vertices belonging to the same connected component, we eliminate any cycles that may form due to adding such extra edges. This means at any point the set of edges with the set of vertices forms a forest. Since each directed edge is traverse by the find operation at most once, and since we can have $O(n)$ edges, wouldn't the sum of work done by all find operations amount to just $O(n)$ work? Since the merge operation can be reduced to just two find operations, the total amount of work done by the merge operations would also come to a total of $O(n)$ work right?

Can somebody tell me exactly where I'm going wrong with this? It also seems to me that union by rank/size is unnecessary, since if the above logic is right, then path compression on its own guarantees an upperbound of $O(m + n)$ on the time complexity.

• +37

By pikel_rik, history, 6 months ago,

Imagine you're given an array $A$ of length $n$ and you're required to find the largest subset of this array that satisfies a certain condition. Let's assume that the given condition makes the problem $NP-$complete and also allows the following to be true:

1. If there exists a subset of length $k$ that satisfies the given condition, then assuming that $k > 1$, there necessarily exists subsets of length $l$, $1 \le l < k$, that satisfies the condition.

2. If there does not exist a subset of length $k$ that satisfies the given condition, then assuming $k < n$, there necessarily will not exist any subset of length $l$, $k < l \le n$, that satisfies the condition.

In short, the binary search property holds.

Since the problem is $NP-$complete, however, there exists no efficient function that allows us to find whether there exists a subset of length $k$ that satisfies the condition or not. So the only thing we can do is iterate over all subsets of length $k$ and check whether each of them satisfy the given condition or not. The time complexity of this function would then be $O(n * \binom{n}{k})$.

My question is, if we use binary search on this problem armed with the brute force function mentioned above, what would the time complexity of this algorithm be in the worst case? I imagine there would be a significant speed-up since we would be evaluating the function only a logarithmic number of times, as opposed to ordinary brute force search where we would evaluate that function at all possible values of $k$ (by iterating through all the subsets of $A$), but not enough to give us a polynomial time algorithm.

For example, you can consider the set cover or vertex cover problems (in both of these problems we're required to find the smallest subset satisfying the given condition however, not the largest).