Errichto's blog

By Errichto, 8 years ago, In English

Both a participant and a contest organizer sometimes wants to stress test their solution with the brute force. Sometimes random tests are quite weak and one needs many thousands (and sometimes millions) of them to become sure about the correctness. The thing is that running a program is quite slow itself, what may hurt if the computation part is fast. On my laptop running a C++ program with empty main() one thousand times takes 1.3s, what doesn't satisfy me. How to make it faster?

I recently prepared a problem with a binary grid (SRM 699, div1-hard TwoSquares) and I wanted to be very careful about the correctness. I wrote slow solutions in C++ and the intended one in Java. Only then I realized how slow usual stress testing is. If they all were in one language, I would quite easily get everything into one program with classes (structs) and I would just run it once, without any overheads. But since the languages were different, I had to rewrite one solution, what not only requires time, but also is a possible place for a mistake.

Does running a program on Windows take the similar amount of time? Is it possible to run a program on Ubuntu or Windows faster than in a milisecond? Given two programs in C++, how to automatically merge them into one program and test them (this feature would be awesome in Polygon I think, where stress testing is able to process only thousands of tests too)?

Regarding my last question (automatically merging two programs) some idea is to wrap everything except includes into a struct and creating one global instance of such a struct. It should be automatically zero-ed (all global variables should be 0, as the user intended).

code

The good thing is that the memory doesn't become huge if we process many tests (or does it?). But mnbvmar noticed that it won't work for static variables. Any way to make it work? Any other ideas?

  • Vote: I like it
  • +58
  • Vote: I do not like it

»
8 years ago, # |
  Vote: I like it +10 Vote: I do not like it

for static's you probably may create two cpps, each looks like this:

namespace { //anon
 //program1
  int main() {}
}

int public_main1() {
  return main();
}
  • »
    »
    8 years ago, # ^ |
    Rev. 3   Vote: I like it +3 Vote: I do not like it

    Hmm, the problem I noticed was that static variables inside functions should be reset between the tested's main function calls, as in (pardon this artificial example):

    #include <bits/stdc++.h>
    using namespace std;
    
    struct S {
      int func(int input) {
        static int x = 0;
        return input + (++x);
      }
    } S_global;
    
    int main() {
      for (int test = 0; test < 1000; ++test) {
        S s1 = S_global;
        cout << s1.func(0);   // we want to make it ALWAYS zero.
      }
    }
    

    Thanks for noticing a problem with another static meaning (that is, variables or functions with static linkage). For others: method described in the post fails on the following code:

    int y;
    static int f(x) { return x + 2 * y; }
    
    int main() {
      int x;
      cin >> x >> y;
      cout << f(x) << endl;
    }
    
    • »
      »
      »
      8 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Ah, yeah, I missed the key problems with static

»
8 years ago, # |
  Vote: I like it +62 Vote: I do not like it

I suggest an alternative solution: just use multitest. Your program considers the input as concatenation of several test cases and returns answers to all of them together. Then you check the equality of big outputs together. There is a problem when your checker is harder then diff, but usually it's enough.

  • »
    »
    8 years ago, # ^ |
      Vote: I like it -75 Vote: I do not like it

    I don't know the opinion of others, but I completely hate this WA #2 on all the problems.

    • »
      »
      »
      8 years ago, # ^ |
        Vote: I like it +5 Vote: I do not like it

      Multitests on Codeforces are quite rare. I would imagine -XraY- meant it only as a problem testing device, not for use in contest :)

      • »
        »
        »
        »
        8 years ago, # ^ |
          Vote: I like it +18 Vote: I do not like it

        Actually, you can find it in almost every program my team and I write. It's too slow for me to copy and paste a dozen of test cases every time I find a bug.

        • »
          »
          »
          »
          »
          8 years ago, # ^ |
            Vote: I like it +5 Vote: I do not like it

          Thanks for pointing that out, I stand corrected. Are the problems you've written available in English? I would love to see them.

          • »
            »
            »
            »
            »
            »
            8 years ago, # ^ |
              Vote: I like it +31 Vote: I do not like it

            I think there's a communication problem here. From what I understood, his team writes multitest code even when solving single-test contests like CF (because a multitest program can solve a single case too, and it's easier for him to debug).

            He did not answer question about posing multitest problems because that wasn't even in his mind when he suggested writing multitest code.

            (Personally, I always forget to zero some global variable when I'm writing multitest code, and end up looking for bugs that don't exist.)

            • »
              »
              »
              »
              »
              »
              »
              8 years ago, # ^ |
              Rev. 2   Vote: I like it +15 Vote: I do not like it

              You are absolutely right, I misunderstood him.

              On a side note, I can definitely relate... I've wasted a lot of valuable minutes on initialisation bugs during GCJ.

              Edit: It seems like this whole thread is full of misunderstanding...

  • »
    »
    8 years ago, # ^ |
      Vote: I like it +3 Vote: I do not like it

    That's one hardcore approach. It's harder to write such a code, isn't it? You must care about global variables and arrays. I didn't know anybody does it by default (at least in C++). I would say it's one more possibility to make a bug.

    It's too slow for me to copy and paste a dozen of test cases every time I find a bug.

    I store tests in files in1 in2 in3 ... and I run them with bash loop if there are many of them.

    Still, I appreciate completely new idea.

    • »
      »
      »
      8 years ago, # ^ |
        Vote: I like it +10 Vote: I do not like it

      If you introduce a bug while implementing multitest, stress testing should find it, isn't it? By the way, I never use non-constant global variables in my solutions. If I can't stuff all of them in the main function, I create a struct to hold them.

      If multitest by itself is not enough and your computer have multiple cores, you can try to use multithreading. It is much easier in C++11 than before since we have std::thread now.

      • »
        »
        »
        »
        8 years ago, # ^ |
          Vote: I like it +5 Vote: I do not like it

        -XraY- said he usually does it even in contests, but I'd say one doesn't always stress-test their solution during the contest — this is why I'm surprised he uses multiple test cases by default. During the contest preparation, do you say all testers/authors should write their programs like that?

        I guess it's a good habit not to use non-constant global variables by the way, though I doubt I will switch, because global arrays are quite comfortable.

        I often use many threads to test my solution(s), what makes everything two times faster. It isn't much compared to the issue I described, where computing part is hundreds times faster than opening a program.

        • »
          »
          »
          »
          »
          8 years ago, # ^ |
            Vote: I like it +15 Vote: I do not like it

          I do it to keep good habits.

          If solution needs to be stress-tested, I think authors should write it without global variables. I sometimes do a conversion in Educational Rounds for others solutions. I also run them under asan or msan to find more bugs.

          • »
            »
            »
            »
            »
            »
            8 years ago, # ^ |
              Vote: I like it 0 Vote: I do not like it

            +1 for ASan (especially in combination with appropriately sized vectors instead of global arrays). I'd also add UBSan for overflow- and basic array bounds checking.

        • »
          »
          »
          »
          »
          4 years ago, # ^ |
            Vote: I like it 0 Vote: I do not like it

          Errichto How do you use threads to test the solution and not cause multiple threads to interfere with each other? Can you share your code ?

»
8 years ago, # |
Rev. 2   Vote: I like it -14 Vote: I do not like it

Deleted. Went off topic

  • »
    »
    8 years ago, # ^ |
      Vote: I like it +18 Vote: I do not like it

    It seems that u completely missed the point

  • »
    »
    8 years ago, # ^ |
      Vote: I like it +5 Vote: I do not like it

    A tip:

    every time you compile your code, it takes about 1.3 seconds.

    You know what ? you can use precompiled headers to speed up the compilation. You can learn more about it here and here. In my case, g++ takes around 0.8s to compile with out precompiled headers, but with precompiled headers, it takes 0.2s.

»
8 years ago, # |
  Vote: I like it +28 Vote: I do not like it

You can make the struct templatized (e.g. template <int N> struct { ... }), that way static variables will be different when you use different template parameter. Then, you could write some kind of recursive template which invokes your function and then calls itself with a different value of template parameter. Template depth can't be very large but you can, for example, call it like this 100 times and then re-start the program. It should hopefully reduce the overhead from re-running the binary for each test case.

  • »
    »
    8 years ago, # ^ |
    Rev. 2   Vote: I like it +10 Vote: I do not like it

    You may increase allowed depth using compiler options (if you have access to them).

    And you may run the template in tree-like way run<i> -> run<2i> and run<2i+1>, that way you'll have 2^{allowed_depth} runs in one file (with appropriate break condition)

»
8 years ago, # |
  Vote: I like it +25 Vote: I do not like it

On my laptop running a C++ program with empty main() takes 1.3s, what doesn't satisfy me.

I've re-read the statement several times to make sure I understand it correctly. How is it even possible? What is it doing all these 1.3s? I would start with debugging what exactly is going on — maybe smth is wrong with the system on your laptop.

For comparison (this is desktop, but I don't think it should be 1000 times faster):

$ time ./a.out

real	0m0.001s
user	0m0.000s
sys	0m0.001s
  • »
    »
    8 years ago, # ^ |
      Vote: I like it +10 Vote: I do not like it

    Oh, I meant to write "running it 1000 times takes 1.3s". This is why later I ask "is it possible to run ... faster than in a milisecond". It's strange that nobody noticed it before. Thanks (corrected now).

»
8 years ago, # |
Rev. 2   Vote: I like it +10 Vote: I do not like it

There have been some prior work on this in the context of fuzzing. AFL does a neat thing with changing the program to call fork() at the beginning of main and doing independent runs in forked children, often winning a factor ~10 runtime. See e.g. http://lcamtuf.blogspot.com/2014/10/fuzzing-binaries-without-execve.html.

Edit: in your case this would take the form of something like:

int main() {
  for (;;) {
    // generate the next set of test data, maybe write to a string that you rebind stdin to
    if (fork() == 0) {
      // freopen stdout or otherwise check that the output is reasonable
      solve();
      return 0;
    }
    int status;
    wait(&status);
    assert(WIFEXITED(status));
    assert(WEXITSTATUS(status) == 0);
  }
}
  • »
    »
    8 years ago, # ^ |
      Vote: I like it +5 Vote: I do not like it

    ... and if you want you can even get some free multithreading out of this by leaving out, say, the first 10 wait calls.