How to Generate Random Connected Graphs

→ Pay attention

Before contest
Pinely Round 4 (Div. 1 + Div. 2)
38:14:19
Register now »

*has extra registration

→ Streams

Atcoder ABC #364 Solution Discussion

By aryanc403

Before stream 13:24:19

View all →

→ Top rated

#	User	Rating
1	tourist	3880
2	jiangly	3669
3	ecnerwala	3654
4	Benq	3627
5	orzdevinwang	3612
6	Geothermal	3569
6	cnnfls_csy	3569
8	jqdai0815	3532
9	Radewoosh	3522
10	gyh20	3447

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	awoo	161
1	maomao90	161
3	adamant	156
4	maroonrk	153
5	-is-this-fft-	148
5	atcoder_official	148
5	SecondThread	148
8	Petr	147
9	nor	144
10	TheScrasse	142

View all →

→ Find user

→ Recent actions

Detailed →

upobir's blog

How to Generate Random Connected Graphs

By upobir, history, 4 years ago, In English

As the title says, I wanna generate random graphs that are connected. Now there are some details herr, obviously I want the graph to have $$$n$$$ nodes which is predetermined. Something else that arises in competitive programming is that the graph should have $$$m$$$ edges(also predetermined). Now how does one do that in an intelligent way.

Now I have done a little bit of googling, and it seems there are no 'good' way of getting a perfectly 'random' such graph i.e. pick one uniformly from all connected graphs with n nodes and m edges. But that's okay, approximately random is also good.

One way I have considered is to make a spanning tree first, then add extra edges. But this has horrible bias for certain graphs. For example for $$$n$$$ node $$$n$$$ edge graph, a cycle graph $$$(1-2-3-...-n-1)$$$ is $$$\frac{n}{3}$$$ times more likely than path $$$(1-2-3-4-...-n)$$$ + edge $$$(1-3)$$$. So what process can be a good choice?

#graph, generated test, random

upobir
4 years ago
12

Comments (12)

Write comment?

DeadlyCritic

4 years ago, # |

+11

All you need is here.

→ Reply

upobir

4 years ago, # ^ |

If I understand it correctly, this library uses the exact method that I talked about in the post. Which I am not saying is bad. However I was hoping for something with less bias towards certain graphs.

→ Reply

DeadlyCritic

4 years ago, # ^ |

Please let us know if you found a better way, i am really into the topic.

→ Reply

DeadlyCritic

4 years ago, # |

← Rev. 2 →

-8

One way to construct a random small graph with $$$m$$$ edges and $$$n$$$ vertices is to add all the edges in a vector, then shuffle them and pick the first $$$m$$$, if you want it to be connected, then first pick edges $$$1-2,\;2-3\dots,\;(n-1)-n$$$, then add all the edges except them in a vector, shuffle it and pick first $$$m-n+1$$$ edges. And in the end shuffle the vertices, i.e. choose a random permutation $$$p$$$ of vertices, and map each vertex $$$u$$$ to $$$p_u$$$.

$$$UPD$$$ i meant to first construct a spanning tree, not a path, sorry for the mistake.

→ Reply

upobir

4 years ago, # ^ |

+18

This will generate connected m-edge graph that will alawys have a hamiltonian path. Which I think is worse than the spanning tree first method.

→ Reply

DeadlyCritic

4 years ago, # ^ |

+10

You are right sorry for my mistake :(, i meant to construct a random spanning tree, and so the same idea as you said.

→ Reply

geniucos

4 years ago, # ^ |

+16

Wouldn't this only generate graphs which have a hamiltonian path? (and therefore clearly not uniformly among the set of all possible graphs of m edges)

→ Reply

DeadlyCritic

4 years ago, # ^ |

← Rev. 2 →

Ye sorry for that :(, but it still constructs good random graphs if its not necessary to be connected , yes?

→ Reply

upobir

4 years ago, # |

Auto comment: topic has been updated by upobir (previous revision, new revision, compare).

→ Reply

seiko.iwasawa

3 years ago, # |

I have an algorithm, but it works $$$O(n^4)$$$.

Let's build a complete graph for $$$n$$$ vertices. After that, let's erase random edges one by one such that the graph remains connected. When the number of edges is exactly $$$m$$$, we get our graph. Note that for each connected graph with $$$m$$$ edges, we have $$$(\frac{n \cdot (n-1)}{2}-m)!$$$ options to get it. So, all of them have equal probability.

This algorithm can be improved. If $$$n \le 10^4$$$ and we build random graph with $$$8 \cdot n$$$ edges, it will be connected with a high probability. So, if $$$m \le 8 \cdot n$$$, we can build this random graph instead of the complete graph. Then the algorithm will work $$$O(64 \cdot n^2)$$$. If $$$m \gt 8 \cdot n$$$, we can build just a random graph with $$$m$$$ edges.

Here is my implementation, but is has a problem if $$$m$$$ is very big (then my algorithm will not find a graph with $$$m$$$ edges).

→ Reply

clyring

3 years ago, # ^ |

← Rev. 2 →

+21

I don't believe this works, because although each connected graph gets $$$(\frac{n \cdot (n-1)}{2}-m)!$$$ options for reaching it, these options do not have equal probability. This is because at each step you can only remove edges that will not disconnect the working graph, and the number of such non-bridges is dependent on which edges you have already removed. The smallest counterexample has $$$n = 4$$$, $$$m = 3$$$, where 4 out of the 16 connected labeled graphs are claws, but your algorithm chooses one with probability $$$4/15$$$.

→ Reply

clyring

3 years ago, # |

← Rev. 2 →

The process in seiko.iwasawa's comment is basically all but the last $$$m - n + 1$$$ steps of the reverse-delete algorithm with random edge weights, and so it can be simulated in $$$O(n \cdot \alpha(n) + m)$$$ expected time using Kruskal's algorithm. For large $$$n$$$, the number of edges that need to be tried before Kruskal's algorithm finds a spanning tree will be of order $$$n \log n$$$ with high probability. However, most of this comes from repeatedly trying internal edges within some large component of size nearly $$$n$$$, and it is possible to sample from the other edges in $$$O(1)$$$ since there can only be one such large component.

I've also come up with another idea that, while still not (yet) practical for $$$m$$$ very close to $$$n$$$, seems serviceable for at least part of that realm between "generate a random spanning tree" and "generate a random graph and hope it's connected." It is to allocate more vertices than will be eventually used, to allow a connected component to grow to size $$$n$$$ with fewer than the normal $$$n \log{n}$$$ or so edges. To keep the analysis and proof of correctness as simple as possible, I consider only the connected component containing the vertex $$$0$$$.

Consider adding edges one at a time to an initially empty graph with $$$s$$$ vertices labeled $$$0$$$ through $$$s-1$$$. If at any point, the connected component containing the vertex $$$0$$$ has $$$n$$$ vertices and $$$m$$$ edges, we relabel its vertices as $$$0$$$ through $$$n-1$$$ and stop. The process fails if this never happens. Given an arbitrary connected labeled graph $$$G$$$ on those $$$n$$$ vertices with $$$m$$$ edges, its probability of being reached by this process is exactly the following quantity:

$$$ \displaystyle \frac{ \displaystyle \binom{s-1}{n-1}}{\displaystyle \binom{\binom{n}{2} + n(s-n)}{m}} $$$

The numerator is the number of sets $$$K$$$ which can be the connected component of size $$$n$$$ containing $$$0$$$. The denominator comes from that for any such set $$$K$$$, the graph $$$G$$$ will be reached via $$$K$$$ iff the (exactly one) (uniformly random) configuration of the $$$\binom{n}{2} + n(s-n)$$$ edges incident to at least one vertex in $$$K$$$ with $$$m$$$ included is the set corresponding internally to $$$G$$$ after relabeling and which includes no boundary edge. Although I don't have a great estimate on the actual asymptotic behavior of its expected runtime, by choosing $$$s$$$ to maximize a related quantity, this process seems to succeed at $$$n = 2 \cdot 10^5$$$, $$$m = 3 \cdot 10^5$$$ on average within around 100 attempts, whereas this size would be unassailable to the other exact methods I have considered.

Here's a quick-and-dirty implementation of the idea:

Spoiler

#include <algorithm>
#include <cmath>
#include <cassert>
#include <iostream>
#include <random>
#include <vector>

constexpr bool verboseOutput = false;
constexpr bool printGraph = true;

using LL = long long;
using VI = std::vector<int>;

struct DSU {
  VI par;
  VI sizes;
  VI edges;

  int n;

  DSU(int vertices)
    :par { VI(vertices) },
     sizes { VI(vertices, 1) },
     edges { VI(vertices, 0) },
     n { vertices }
  {
    for (int i = 0; i < n; ++i)
      par[i] = i;
  }

  int find(int x) {
    while (x != par[x])
      x = par[x] = par[par[x]];
    return x;
  }

  bool unite(int x, int y) {
    x = find(x);
    y = find(y);
    if (sizes[x] < sizes[y])
      std::swap(x, y);
    edges[x] += 1;
    if (x == y)
      return false;
    par[y] = x;
    sizes[x] += sizes[y];
    edges[x] += edges[y];
    return true;
  }
};

int main() {
  using namespace std;
  ios::sync_with_stdio(false);
  cin.tie(nullptr);

  int n;
  LL m;
  int maxTries, seed;
  cin >> n >> m >> maxTries >> seed;

  assert(m >= n - 1);
  assert(m <= n * (n - 1LL) / 2);

  auto succRate = [&](int s) -> long double {
    LL relEdges = n * (n - 1LL) / 2 + n * LL(s - n);
    return  lgammal(s-1) + lgammal(relEdges - m + 1)
      - logl(s) - lgammal(s - n + 1) - lgammal(relEdges + 1);
  };

  int s = n;
  while (succRate(s+1) > succRate(s))
    ++s;

  mt19937 src { seed };
  using UI = uniform_int_distribution<int>;
  UI rvertex { 0, s - 1 };

  if (verboseOutput)
    cout << "s = " << s << endl;

  for (int currTry = 1; currTry <= maxTries; ++currTry) {
    if (verboseOutput)
      cout << "try #" << currTry << endl;
    DSU dsu { s };
    vector<VI> adj = vector<VI>(s);
    auto tryInsEdge = [&]() -> void {
      int x = rvertex(src);
      int y = rvertex(src);
      if (x == y)
        return;
      bool edgeExists = false;
      for (int z : adj[x])
        if (y == z)
          edgeExists = true;
      if (edgeExists)
        return;
      adj[x].push_back(y);
      adj[y].push_back(x);
      dsu.unite(x, y);
    };

    int root = dsu.find(0);
    while (dsu.sizes[root] <= n && dsu.edges[root] < m) {
      tryInsEdge();
      root = dsu.find(0);
    }
    if (dsu.sizes[root] != n || dsu.edges[root] != m)
      continue;

    struct Edge {
      int x;
      int y;
    };
    vector<int> ps;
    ps.reserve(n);
    vector<Edge> es;
    es.reserve(m);
    for (int i = 0; i < s; ++i)
      if (dsu.find(i) == root) {
        ps.push_back(i);
        for (int j : adj[i])
          if (j > i)
            es.push_back(Edge { i, j });
      }

    assert(int(ps.size()) == n);
    assert(int(es.size()) == m);

    shuffle(es.begin(), es.end(), src);
    // originally there was some edge-removal stuff as well,
    // but it doesn't seem actually useful, so it's commented out
    //es.resize(m);
    auto compress = [&](int p) -> int {
      return int(lower_bound(ps.begin(), ps.end(), p) - ps.begin());
    };

    //dsu = DSU(n);
    for (Edge& e : es) {
      e.x = compress(e.x);
      e.y = compress(e.y);
      //dsu.unite(e.x, e.y);
    }
    //if (dsu.sizes[dsu.find(0)] == n)
    {
      if (printGraph) {
        cout << n << '\n' << m << '\n';
        for (Edge e : es)
          cout << e.x << ' ' << e.y << '\n';
      }
    }

    return 0;
  }

  cerr << "max tries exceeded\n";

  return 1;
}

(And finally I repeat the well-known observation that large random combinatorial objects are often very statistically similar to each other. That can make them easier as objects of study, but can also be a disadvantage: If the goal is to generate strong test data it is often wise to embrace the different biases of different processes.)

→ Reply