As the title says, I wanna generate random graphs that are connected. Now there are some details herr, obviously I want the graph to have $$$n$$$ nodes which is predetermined. Something else that arises in competitive programming is that the graph should have $$$m$$$ edges(also predetermined). Now how does one do that in an intelligent way.

Now I have done a little bit of googling, and it seems there are no 'good' way of getting a perfectly 'random' such graph i.e. pick one uniformly from all connected graphs with n nodes and m edges. But that's okay, approximately random is also good.

One way I have considered is to make a spanning tree first, then add extra edges. But this has horrible bias for certain graphs. For example for $$$n$$$ node $$$n$$$ edge graph, a cycle graph $$$(1-2-3-...-n-1)$$$ is $$$\frac{n}{3}$$$ times more likely than path $$$(1-2-3-4-...-n)$$$ + edge $$$(1-3)$$$. So what process can be a good choice?

All you need is here.

If I understand it correctly, this library uses the exact method that I talked about in the post. Which I am not saying is bad. However I was hoping for something with less bias towards certain graphs.

Please let us know if you found a better way, i am really into the topic.

One way to construct a random small graph with $$$m$$$ edges and $$$n$$$ vertices is to add all the edges in a vector, then shuffle them and pick the first $$$m$$$, if you want it to be connected, then first pick edges $$$1-2,\;2-3\dots,\;(n-1)-n$$$, then add all the edges except them in a vector, shuffle it and pick first $$$m-n+1$$$ edges. And in the end shuffle the vertices, i.e. choose a random permutation $$$p$$$ of vertices, and map each vertex $$$u$$$ to $$$p_u$$$.

$$$UPD$$$ i meant to first construct a spanning tree, not a path, sorry for the mistake.

This will generate connected m-edge graph that will alawys have a hamiltonian path. Which I think is worse than the spanning tree first method.

You are right sorry for my mistake :(, i meant to construct a random spanning tree, and so the same idea as you said.

Wouldn't this only generate graphs which have a hamiltonian path? (and therefore clearly not uniformly among the set of all possible graphs of m edges)

Ye sorry for that :(, but it still constructs good random graphs if its not necessary to be connected , yes?

Auto comment: topic has been updated by upobir (previous revision, new revision, compare).I have an algorithm, but it works $$$O(n^4)$$$.

Let's build a complete graph for $$$n$$$ vertices. After that, let's erase random edges one by one such that the graph remains connected. When the number of edges is exactly $$$m$$$, we get our graph. Note that for each connected graph with $$$m$$$ edges, we have $$$(\frac{n \cdot (n-1)}{2}-m)!$$$ options to get it. So, all of them have equal probability.

This algorithm can be improved. If $$$n \le 10^4$$$ and we build random graph with $$$8 \cdot n$$$ edges, it will be connected with a high probability. So, if $$$m \le 8 \cdot n$$$, we can build this random graph instead of the complete graph. Then the algorithm will work $$$O(64 \cdot n^2)$$$. If $$$m \gt 8 \cdot n$$$, we can build just a random graph with $$$m$$$ edges.

Here is my implementation, but is has a problem if $$$m$$$ is very big (then my algorithm will not find a graph with $$$m$$$ edges).

I don't believe this works, because although each connected graph gets $$$(\frac{n \cdot (n-1)}{2}-m)!$$$ options for reaching it, these options do not have equal probability. This is because at each step you can only remove edges that will not disconnect the working graph, and the number of such non-bridges is dependent on which edges you have already removed. The smallest counterexample has $$$n = 4$$$, $$$m = 3$$$, where 4 out of the 16 connected labeled graphs are claws, but your algorithm chooses one with probability $$$4/15$$$.

The process in seiko.iwasawa's comment is basically all but the last $$$m - n + 1$$$ steps of the reverse-delete algorithm with random edge weights, and so it can be simulated in $$$O(n \cdot \alpha(n) + m)$$$ expected time using Kruskal's algorithm. For large $$$n$$$, the number of edges that need to be tried before Kruskal's algorithm finds a spanning tree will be of order $$$n \log n$$$ with high probability. However, most of this comes from repeatedly trying internal edges within some large component of size nearly $$$n$$$, and it is possible to sample from the other edges in $$$O(1)$$$ since there can only be one such large component.

I've also come up with another idea that, while still not (yet) practical for $$$m$$$ very close to $$$n$$$, seems serviceable for at least part of that realm between "generate a random spanning tree" and "generate a random graph and hope it's connected." It is to allocate more vertices than will be eventually used, to allow a connected component to grow to size $$$n$$$ with fewer than the normal $$$n \log{n}$$$ or so edges. To keep the analysis and proof of correctness as simple as possible, I consider only the connected component containing the vertex $$$0$$$.

Consider adding edges one at a time to an initially empty graph with $$$s$$$ vertices labeled $$$0$$$ through $$$s-1$$$. If at any point, the connected component containing the vertex $$$0$$$ has $$$n$$$ vertices and $$$m$$$ edges, we relabel its vertices as $$$0$$$ through $$$n-1$$$ and stop. The process fails if this never happens. Given an arbitrary connected labeled graph $$$G$$$ on those $$$n$$$ vertices with $$$m$$$ edges, its probability of being reached by this process is exactly the following quantity:

The numerator is the number of sets $$$K$$$ which can be the connected component of size $$$n$$$ containing $$$0$$$. The denominator comes from that for any such set $$$K$$$, the graph $$$G$$$ will be reached via $$$K$$$ iff the (exactly one) (uniformly random) configuration of the $$$\binom{n}{2} + n(s-n)$$$ edges incident to at least one vertex in $$$K$$$ with $$$m$$$ included is the set corresponding internally to $$$G$$$ after relabeling and which includes no boundary edge. Although I don't have a great estimate on the actual asymptotic behavior of its expected runtime, by choosing $$$s$$$ to maximize a related quantity, this process seems to succeed at $$$n = 2 \cdot 10^5$$$, $$$m = 3 \cdot 10^5$$$ on average within around 100 attempts, whereas this size would be unassailable to the other exact methods I have considered.

Here's a quick-and-dirty implementation of the idea:

Spoiler(And finally I repeat the well-known observation that large random combinatorial objects are often very statistically similar to each other. That can make them easier as objects of study, but can also be a disadvantage: If the goal is to generate strong test data it is often wise to embrace the different biases of different processes.)