When submitting a solution in C++, please select either C++14 (GCC 6-32) or C++17 (GCC 7-32) as your compiler. ×

GLAYS's blog

By GLAYS, history, 4 years ago, In English

Hello!

When I need to stress-test a problem on trees, I usually find a random tree like this:

int n = random(1, 100);
for(int i = 2; i <= n; ++i){
    p = random(1,i - 1);
    addEdge(p, i), addEdge(i, p);
}

As in; to find a random tree of $$$N$$$ nodes, I just find a random sequence $$$P$$$(parents) of length $$$N-1$$$ with each $$$1<=Pi<=i-1$$$ for each $$$2<=i<=N$$$.

And it worked well, or so I believed at least.

But I remember watching an Errichto stream(I think it's the AtCoder DP contest one) where he showed how he stress tests and then said that the previous method only prints a specific kind of trees, and that he recommends using Prüfer Sequences. I kept thinking about why that's true but found no answer online.

Of course, Prüfer Sequences are clearly the best and most trusted choice since it's proved that any sequence represents a unique tree and that any tree can be represented by a sequence(more info here). And though it's not that complicated, I still don't want to keep writing it whenever I need to, unless there's an important difference between the two methods.

So if anyone would enlighten me on the difference if it exists, that would be great :D (It would be good if it's Errichto too xd)

Any other advices/tricks on stress-testing graphs or in general would be awesome!

Thankss ^^

  • Vote: I like it
  • +13
  • Vote: I do not like it

»
4 years ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

In addition to random tests, you should specifically generate trees such as lines and stars. Random methods aren't going to produce them for large $$$N$$$, and many issues are exposed only by such trees.

One neat trick is that many solutions compute some property of an unlabeled tree but do so in a way that is influenced by labeling (for example, if you always traverse the tree from root 1). Relabeling the tree in a test case and running it again to make sure you get the same result can be an easy sanity check done with a large generated input for which you don't know the right answer.

  • »
    »
    4 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Thankss! Relabelling does indeed seem like a very effective sanity check that could probably cover most uncovered tests without it. I'll remember to re-root every tree I find and try all other nodes from 2 to N.

»
4 years ago, # |
Rev. 3   Vote: I like it 0 Vote: I do not like it

Edit: wrong

Spoiler
  • »
    »
    4 years ago, # ^ |
    Rev. 2   Vote: I like it 0 Vote: I do not like it

    Thanks!

    Yes I did find that it doesn't find them all of course(even from the fact that that kind of sequence is only a special case of the Prüfer Sequences), but I failed to find something in common between them.

»
4 years ago, # |
  Vote: I like it +3 Vote: I do not like it

Some solutions might assume something about the order of vertices and work for all trees generated by your code, but fail on something else. An example tree that you can't create is 1-3-2. An easy fix is to shuffle the names of vertices before printing.

  • »
    »
    4 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Thanksss! It does make sense.

    I guess constructing a permutation of length $$$N$$$ and permuting the nodes' IDs will do it, it will even count as doing the re-rooting trick from up top.

»
4 years ago, # |
Rev. 3   Vote: I like it +19 Vote: I do not like it

I can't believe this isn't mentioned yet, but here's another flaw: in your method, for any vertex the expected distance from 1 is $$$\mathcal{O}(\log N)$$$. So for example, a stupid $$$\mathcal{O}(N)$$$ LCA algorithm will work perfectly. If you want to check for TLE, you should also try some kind of "longer" trees.

»
4 years ago, # |
  Vote: I like it +16 Vote: I do not like it

Using different distributions for parent vertex p = random(1,i - 1); you may get different types of trees. For example you can choose parent among previous K vertices or generate several values and take maximum to get longer trees.