I got this question in one of my interviews. Let me state the problem statement clearly,

"You're given an undirected weighted graph consisting of N nodes and E edges and all the weights are unique ( meaning no weights will have duplicates ). Note the graph may also be disconnected ( there could be more than one component in your graph ). The nodes in the graph are labelled starting from 1 to N."

Now you're given an operation that you could perform which is, cut( u , v ) , where u is the label of one node and v is the label of another node and u < v . what this operation returns as result for the given u and v is that it takes the minimum weighted edge adds its value to result and then removes it from graph and then checks whether u and v are disconnected if they are disconnected it stops and returns the result. If the u and v are still connected by some means then it checks for the next minimum weighted edge in the graph and adds its value to the existing result value and removes the edge from the graph and then again check whether u and v are disconnected, if disconnected it stops and returns result. If not then again it continues the same procedure with the next minimum weighted edge in the graph.

So now the question is calculate the sum of all cut(u,v) for all u and v pairs in the graph where u < v . Like if N = 3 and say E = 5, (1,2), (1,3), (2,3) will be the all possible (u,v) pairs where u < v, where u and v are labels of the node. Then final SUM = cut(1,2)+cut(1,3)+cut(2,3). As the answer might be big MOD the value with 1e9+7.

So this is the question I got in my interview and I didn't have enough time to process this question and it seemed pretty hard too. And when I asked for hint, use disjoint set data structures was the reply. I had no clue on how to proceed it further. Can some body provide ideas and different approaches that could be taken to solve this problem.Thanks.

So the idea for solving this is to maintain a disjoint set of all the nodes and process the edges in decreasing order of weight. When we process an edge, we want to know how many pairs of nodes are currently connected by only that edge and those edges with greater edges. This is because for all such pairs, cutting them will require removing the current edge. Initially there are 0 connected pairs, and when you get to a new edge that connects two separate components of size a and b, then the number of connected pairs is increased by a*b. Then, if after processing an edge of weight w we have x connected pairs, the answer is increased by w*x.

mkirsche can you walk through your procedure over this example, I tried but got stuck in between, say there are 5 Nodes in the graph and edge list is of the form ( node1 , node2 , weight ) which is ( 1,5,3 ) ; ( 2,4,2 ) ; ( 1,2,1 ) ; ( 2,3,5 ) ; ( 1,3,4 ) ?

Initially there are no connected pairs. Process (2, 3, 5). We now have one connected pair since 2 and 3 were both in components of size 1 (so we add 1x1). The components are {1}, {2, 3}, {4}, {5}, and we add to our answer 5 * 1. Process (1, 3, 4). We add two more connected pairs since 1 is in a component of size 1 and 3 is in a component of size 2. Components are {1, 2, 3}, {4}, {5}, and we add to our answer 3*4 = 12 so our answer is now 17. Process (1, 5, 3). We add 3 more connected pairs since 1 is in a component of size 3 and 5 is in a component of size 1. So we have 6 connected pairs and add 6*3 to our answer to bring it to 35. The components are {1, 2, 3, 5} and {4}. Process {2, 4, 2}. Add 4 more connected pairs. Now everything is a single connected component and we have a total of 10 connected pairs. We add 10*2 = 20 to the answer so now it's 55. Process (1, 2, 1). They're already connected so we stay at 10 connected pairs, and add 10*1 to our answer to get a total of 65.

while processing (1, 3, 4) edge, we have 3 pairs till now ({1,3} , {1,2} ,{2,3}). for {1,3} path is 1->3 and hence we will have to delete edge with weight 4. for {1,2} path is 1->3->2 and hence we will delete edge with weight 4 and then there is no path to go from 1 to 2. for {2,3} path is 2->3 and hence we will delete edge with weight 5. and this sums up to 4+4+5=13 but as per your algorithm the answer is 17 ( as you are considering the edge with weight 4 thrice but why?). So where am i wrong in visualizing the above scenario?

Just a doubt, (it might seem trivial, but still), why process in decreasing order?

When you process an edge, you want to know which pairs' cut() operations use that edge. In order for that to happen, cutting that edge and all smaller edges must disconnect that pair. That means that the subgraph consisting of all edges with larger weights must have that pair disconnected. Processing edges in decreasing weight order means that when you start to look at an edge e, your disjoint set will reflect the connectivity of the subgraph consisting of edges with weights larger than that of e.

You can try to submit code for this here.

In my opinion, it is a bit hard for an interview. You can look at the contest dashboard and see how many people were able to solve it in a 24+ hour contest.

Yeah it was really hard and I indeed messed up in the interview. logic_max, thanks a lot for the actual problem link.

Two other problems based on similar concept :- http://codeforces.com/problemset/problem/437/D http://www.spoj.com/problems/KOICOST/

_mkirsche can you please help with the implementation of this idea.. i was trying this problem http://www.spoj.com/problems/KOICOST/ and i understood what you said but for each edge how i have to calculate the total number of pairs while maintaining a disjoint set structure?? and also for this test case: 6 7 1 2 10 2 3 2 4 3 5 6 3 15 3 5 4 4 5 3 2 6 6 output: 256 first (3,6) is in a set and we add 15*1 to answer. then we process (1,2) now for this connected pairs is only 1 but this value 10 will also be added for vertices pair (3,6) so that means we have to add 2*10 to answer.. so if we calculate the previous total connected pairs again then wont it exceed time limit? please help me with the implemenation someone..

edit: i implemented it after a few hours of paperwork.. and got AC! the thing is that we can take a temporary variable t to store the size of components greater having edge weight greater than current edge.. my solution := http://ideone.com/OQtINb

edit:This is the same approach as the first comment. I read it after writing this comment.This is my idea. Please tell me if its right or wrong. I use divide and conquer with dsu. Sort edges in descending order. Pick the next edge in this order. Find the current set of the vertices connected by this edge. If they are same, then this edge won't disconnect the graph when we are finding cut(u,v). Thus, this edge is not the cut(u,v) for any pair of vertices. Else, answer+=|edge(u,v)|*dsu(u).size * dsu(v).size. Now add the edge in out dsu by merging sets, and adding the sizes of the sets. I think it can be correct because we have to go trough the same order of deletion for any pair cut(u,v), and any edge that is not in the component anymore can't be the cut(u,v) for any pair(u,v). Hence, it divides and conquers.

Please tell me if its right or wrong.