We all know the time complexity of the Union Find data structure, better know as dsu, over a series of $$$m$$$ union and find operations is $$$O((m + n)α(n))$$$, but I find it hard to convince myself of this fact. I feel like the time complexity is actually linear ($$$O(m + n)$$$). This isn't a post meant to prove the time complexity of the Union Find structure to be linear. I'm simply asking where I'm going wrong with the way I'm thinking about this data structure.

This is how I'm thinking about this data structure. Consider the union operation as simply adding an edge between two vertices. If these two vertices are already in the same connected component/set, then we don't bother adding the edge, otherwise we add the edge. The way most of us implement our Union Find structures is such that the edge we add is directed from one connected component to the other (usually from the one with lower rank to the one with higher rank).

Now coming to the find operation which finds the connected component to which a vertex belongs. This essentially traverses some number of directed edges to give the "leader" of the connected component it lies in. But when a directed edge is traversed, it isn't traversed ever again (except when the destination vertex of the edge is a leader). Since we do not add edges between vertices belonging to the same connected component, we eliminate any cycles that may form due to adding such extra edges. This means at any point the set of edges with the set of vertices forms a forest. Since each directed edge is traverse by the find operation at most once, and since we can have $$$O(n)$$$ edges, wouldn't the sum of work done by all find operations amount to just $$$O(n)$$$ work? Since the merge operation can be reduced to just two find operations, the total amount of work done by the merge operations would also come to a total of $$$O(n)$$$ work right?

Can somebody tell me exactly where I'm going wrong with this? It also seems to me that union by rank/size is unnecessary, since if the above logic is right, then path compression on its own guarantees an upperbound of $$$O(m + n)$$$ on the time complexity.