Generators with testlib.h

#	User	Rating
1	ecnerwala	3650
2	Benq	3582
3	Geothermal	3570
3	orzdevinwang	3570
5	cnnfls_csy	3569
6	tourist	3565
7	maroonrk	3532
8	Radewoosh	3522
9	Um_nik	3483
10	jiangly	3468

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	163
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	151
8	SecondThread	147
9	orz	146
10	pajenegod	145

What are generators?

Generators are helper programs that output test. Most programming task usually has a large input (for example, an array of up to $$$2 * 10^5$$$ elements, a tree of up to $$$10^5$$$ vertices), so it's not possible to add all the tests manually. In these cases, generators come to the rescue.

If you are writing a generator in C++, it is recommended to use the testlib.h library.

Types of generators

There are two types of generators:

Single-test generator: output exactly one test in a single run. Usually, to generate several tests, one must run the generator several times with different command line parameters. Such generators output the test to the standard output stream (to the screen).
Multiple-test generator: output many tests in a single run. Such generators output tests to files (one file for each test).

An example single-test generator with testlib.h

The following generator output a pair of integers with each element from $$$1$$$ to $$$n$$$ — where $$$n$$$ is a command line parameter passed to the generator.

#include "testlib.h"
#include <iostream>

using namespace std;

int main(int argc, char* argv[])
{
    registerGen(argc, argv, 1);
    int n = atoi(argv[1]);
    cout << rnd.next(1, n) << " ";
    cout << rnd.next(1, n) << endl;
}

Why testlib.h?

On the surface, it seems that testlib.h is not necessary to write a generator. This is not true. Almost all generators need to generate random values, and it is tempted to use rand(). However, this is a bad practice. A basic principle of writing generators is that: a generator must output the same test when compiled by any compiler on any platform if it is run in the same way (using the same command line parameters). When using rand() or C++11 classes like mt19937/uniform_int_distribution, your program will output different tests when compiled with different compilers.

The random value generator in testlib.h ensures that the same value is generated regardless of the (test) generator and platform. Besides, testlib.h has various conveniences for generating tests, for example, rnd.next("[a-z]{1,10}") will return a random word of length $$$1$$$ to $$$10$$$ from letters a to z.

Translator's note: There are more issues with using rand() aside from the above one. Refer to this blog for a detailed explanation about these issues.

Available method

To initialize a testlib generator, the first line of your generator must be of the form registerGen(argc, argv, 1); (where 1 is the version of the random number generator used). After that, it will be possible to use the rnd object, which will be initialized with a hash from all the command line arguments. Thus, the output of g 100 and g.exe "100" will be the same, while g 100 0 will be different.

rnd is of type random_t. That is, you can create your own generator, but this is not necessary in most of the cases.

rnd has many useful member functions. Here are some examples:

Call	Return value
rnd.next(4)	An equiprobable random integer from $$$0$$$ to $$$3$$$ (inclusive)
rnd.next(4, 100)	An equiprobable random integer from $$$4$$$ to $$$100$$$ (inclusive)
rnd.next(10.0)	An equiprobable random real number in the half-interval $$$[0; 10)$$$
rnd.next("one\|two\|three")	An equiprobable random word out of 'one', 'two' and 'three'
rnd.next("[1-9][0-9]{99}")	An equiprobable random 100-digit number as a string
rnd.wnext(4,t)	`wnext` is a method of obtaining an uneven distribution (with a biased expectation), the parameter `t` denotes the number of calls to the maximum operation for similar next calls. For example `rnd.wnext(3, 1)` is equivalent to `max(rnd.next(3), rnd.next(3))`, and `rnd.wnext(4, 2)` is equivalent to `max(rnd.next(4), max(rnd.next(4), rnd.next(4)))`. If t < 0, then -t will find the minimum. If t = 0, then `wnext` is equivalent to `next`.
rnd.any(container)	A random element of the container `container` (with random access via an iterator), for example, it works for `std::vector` and `std::string`

Also, please do not use std::random_shuffle, use the shuffle from testlib.h instead. It also takes two iterators, but works using rnd.

Translator's note: If my understanding is correct, rnd.wnext is defined as follows:

$$$wnext(i, t) = \left\{\begin{matrix} next(i) & t = 0 \\ \max(next(i), wnext(i, t-1)) & t > 0 \\ \min(next(i), wnext(i, t+1)) & t < 0 \\ \end{matrix}\right.$$$

Example: generating an undirected tree

Below is the code of an undirected tree generator that takes two parameters — the number of vertices and the 'elongation' of the tree. For example:

For $$$n = 10$$$, $$$t = 1000$$$, a path graph (degree of all vertices are at most $$$2$$$) is likely to be generated
For $$$n = 10$$$, $$$t = -1000$$$, a star graph (there's only one non-leaf vertex in the tree) is likely to be generated.

registerGen(argc, argv, 1);

int n = atoi(argv[1]);
int t = atoi(argv[2]);

vector<int> p(n);

/* setup parents for vertices 1..n-1 */
for(int i = 0; i < n; ++i)
    if (i > 0)
        p[i] = rnd.wnext(i, t);

printf("%d\n", n);

/* shuffle vertices 1..n-1 */
vector<int> perm(n);
for(int i = 0; i < n; ++i)
    perm[i] = i;
shuffle(perm.begin() + 1, perm.end());

/* put edges considering shuffled vertices */
vector<pair<int,int> > edges;
for (int i = 1; i < n; i++)
    if (rnd.next(2))
        edges.push_back(make_pair(perm[i], perm[p[i]]));
    else
        edges.push_back(make_pair(perm[p[i]], perm[i]));

/* shuffle edges */
shuffle(edges.begin(), edges.end());

for (int i = 0; i + 1 < n; i++)
    printf("%d %d\n", edges[i].first + 1, edges[i].second + 1);

How to write a multiple-test generator?

A multiple-test generator in one execution can output more than one test. Tests by such a generator are output to files. In the generator on testlib.h it is enough to write startTest(test_index) before the test output. This will re-open (freopen) the standard output stream to a file named test_index.

Please note that if you are working with the Polygon system, in this case, you need to write something like multigen a b c > {4-10} in the script (if it is assumed that starting the multiple-test generator will return tests 4, 5, 6, 7, 8, 9, and 10).

Other notes about generators

Strictly follow the format of the test — spaces and line breaks should be placed correctly. The test should end with a line feed. For example, if the test consists of a single number, then output it as cout << rnd.next (1, n) << endl; — with a line feed at the end.
If the test size is large, it is prefered to use printf instead of cout — this will improve the performance of the generator.
It is better to use cout to output long long, but if you want printf, then use the I64 constant (for example, printf(I64, x);).
Please be awared about various cases of C++ undefined behavior. For example, in the first example generator above, if the two cout commands are combined into one, the order of the rnd.next function calls is not defined.

Translator's note: about the third point, using lld constant with printf to output long long used to be problematic in the past, but is no longer an issue now.

Further examples

Further examples of generators can be found in the release notes or directly at the GitHub repository.

Comments (5)

Write comment?

Kuroni

5 years ago, # |

Finally, after all these years ...

→ Reply

adamant

Oh, you translated this? That's very nice! Can you add the translation on that page? I also may do it if you share source code with me

5 years ago, # ^ |

I merged it into main post, thanks!

dmkz

9 months ago, # |

+11

C++11 classes like mt19937/uniform_int_distribution, your program will output different tests when compiled with different compilers.

Can you, please, share any proof/reference/real story about this point? I saw notes here, that C++11 standard requires 10000th produced value to be equal 4123659995 and 9981545732273789042 for mt19937 and mt19937_64, but, probably, difference exists later or the source of difference is uniform_int_distribution?

Testing of codeforces/ideone/atcoder compilers

#include <iostream>
#include <random>

int main() {
    // codeforces:
    // c++20: 1018032531
    // c++17: 1018032531
    // c++14: 1018032531
    // clang++17: 1018032531
    // clang++20: 1018032531
    // visual c++ 17: 1018032531
    // c++17 64 bit: 1018032531
    // ideone c++14: 1018032531
    // atcoder: 
    // c++ (GCC 9.2.1): 1018032531
    // c++ (Clang 10.0.0): 1018032531
    std::mt19937 gen(1234567);
    std::cout << gen() << std::endl;
}

AbsurdMan

9 months ago, # ^ |

Generators seem to return the same value from one compiler to another, but distributions return different results, as thier behaivior is not specified by the c++ standard. So you indeed should use testlib in order to have same results under different compilers instead of uniform_int_distribution.

Tested on codeforces:

#include <iostream>
#include <random>

using namespace std;

const int SEED = 42;
mt19937 RNG(SEED);

signed main() {
    // Clang++20 Diagnostics:              635934 122646 282793 845627 886883
    // Clang++17 Diagnostics:--------------635934 122646 282793 845627 886883
    // GNU G++15 6.4.0:                    374624 796722 950928 183476 732158
    // GNU G++17 7.3.0:--------------------374624 796722 950928 183476 732158
    // GNU G++20 11.2.0 (64 bit, winlibs): 374540 796543 950715 183434 731994
    // Microsoft Visual C++ 2017           635934 122646 282793 845627 886883
    // GNU G++17 9.2.0 (64 bit, winlibs):  374624 796722 950928 183476 732158
    uniform_int_distribution<int> d(0, 1e6);
    for (int i = 0; i < 5; ++i) {
      cout << d(RNG) << ' ';
    }
    return 0;
}

xuanquang1999's blog