mazihang2022's blog

By mazihang2022, history, 3 years ago, In English

Is it possible to hack authors' solutions?

  • Vote: I like it
  • +38
  • Vote: I do not like it

»
3 years ago, # |
  Vote: I like it +54 Vote: I do not like it

Sometimes it is possible to hack the authors' solution, as seen here. However, it is generally rare, as there are many people (the coordinator and the testers) who read the problem and the solution and verify that they are correct.

Another way that authors ensure that the solutions are correct is by writing an exponential brute force, and checking their solutions against thousands of small cases. This is usually enough to catch most errors.

  • »
    »
    3 years ago, # ^ |
      Vote: I like it +4 Vote: I do not like it

    So will the contest be unrated if this happens?

    • »
      »
      »
      3 years ago, # ^ |
        Vote: I like it +17 Vote: I do not like it

      It depends on how much it affects the contest. For example. when issue is fixed quickly enough (by changing solution or problem statement) or when error in solution appears only on systests and correct solution exists, contest will be rated.

»
3 years ago, # |
Rev. 2   Vote: I like it +4 Vote: I do not like it
  1. Multiple solutions: different authors, various approaches (including slow ones).
  2. Stress-testing.
  3. Setter should write down a proof (if any) and somebody else should check it.
  • »
    »
    3 years ago, # ^ |
      Vote: I like it +13 Vote: I do not like it

    Also, reference solutions with added bugs to ensure tests actually catch them.

»
3 years ago, # |
  Vote: I like it +49 Vote: I do not like it

For Codeforces, the problem selection procedure is pretty high quality, so the chance of this happening is very low. What follows below is only from my experience with problemsetting/coordinating for my uni contests (which spans about a dozen contests or so, and about a hundred problems).

Whenever I coordinate contests for my university CP club contests, I ask problemsetters to submit a formal mathematical proof of correctness of the intended solution as well as implementation (line-by-line, if it isn't standard) before setting the problem on Polygon (for instance, proof of claims needed for greedy, dp and binary search problems; structural properties for graph problems and so on). Any problem that I have ever approved using this strategy has had a correct solution.

From my experience, the more formal (and low-level) the proofs are, the less probable it is to make mistakes and miss out corner cases. For instance, while a high-level proof might have a chance of missing the fact that there are no cycles with length $$$\le 2$$$ in an undirected graph but there are such cycles in directed graphs, a low-level proof will definitely not miss out that detail. Maybe this arises out of my mathematical inclination, but I've seen that the more "machine-verifiable" a plausible proof seems to be, the more the chances are that it is correct.

An interesting thing that we've been thinking of implementing is to somewhat formalize the test-generation pipeline (since the judge has the responsibility to minimize false positives and eliminate false negatives), which is overlooked by quite a few new problemsetters. Testing with a brute-force solution also helps detect any incorrect author's solution, so it is imperative that the tests comprise a diverse enough subsets of all permissible inputs which can eliminate false positives and negatives on most sane approaches, even for the author's solution. Some of the things we currently think of while making tests are as follows, but they are just heuristics and not totally formal:

  • TLE tests for both heuristic solutions as well as plain dumb solutions (using stress tests)
  • WA tests for stuff like greedy/dp, for heuristic solutions with the wrong greedy or dp
  • Exhaustive tests for small cases
  • Random tests (with certain structural properties, so not completely random) to account for cases we can't manually find
  • RE tests (out of bounds, overflow)
  • Ensure that if the problem has subtasks, then if a subtask A generalizes subtask B, then tests of subtask A should contain those of subtask B.
  • Look through the proof of the author's solution (as well as the tester's solution and others who have solved the problem), and consult a list of generic mistakes that people make while implementing various algorithms (for instance, for binary search, people sometimes write an infinite loop, make off by one errors, miss out the boundaries and so on), and modify some pre-built tests to fit that problem.

If there are some more test-generation ideas that I missed out on, or some strategies that can "formalize" this procedure to a good extent, please let me know!