I've been involved with testing a lot of Codeforces contests and coordinating contests for my uni, and have seen various kinds of issues crop up. However, I was able to come up with a stable system for my uni contests that helped make the experience as educational and error-free as possible, and I have found the lack of some of those things to be the main reason behind issues faced by Codeforces contests.

Given that it takes months to get a round ready, it is better to be more careful than necessary compared to getting your round unrated or having bad feedback in general.

I'd like to thank Ari, null_awe and prabowo for reviewing the blog and discussing more about problem-setting.

The current Codeforces contest guidelines are here for problem-setters and here for testers, and they have some great sanity checks that will help you make a good contest. But I still think it is missing some things that leads to bad contests at times.

## Issues

Let's list out some issues that plague contests.

- Wrong intended solution or tests.
- Weak tests.
- TL or ML too tight or too loose.
- Some hacks are not accepted.
- Wrong IO format.
- Checker issues — too stringent, too loose, or simply wrong.
- Guessable problems.
- Problems that have appeared before.
- Unbalanced contests — difficulty-wise.
- Unbalanced contests — topic-wise.
- Long queue, system unavailability.

Out of these, the most glaring issue is having a wrong intended solution itself, and leads to making the contest unrated immediately.

## Discussion

I think the fundamental problem behind most of these issues is two-fold:

- Lack of rigor while setting a problem.
- Not adopting a security-based approach to testing (pentests, for example).

Whenever I coordinated contests at my uni, I sent out a couple of guideline documents for setters and testers, that built upon these two ideas. The guiding principle for the first of these reasons (that every author and tester should understand, in my opinion) is consistency (and proofs of it). Not just one type of consistency, but consistency of the intent, constraints, statement, checker, validator and tests — all at the same time.

**NOTE:** Before we go forward, let's clarify what we mean by a formal proof. In this comment, I clarified that we are not talking about scary super-technical proofs (such as ones you might encounter in advanced set theory). A proof that is logically sound, where steps follow from basic logical reasoning and don't have gaps, is good enough. I call it a "formal proof" just because people on Codeforces call anything that looks reasonable a "proof" (which is obviously wrong and misleading) — and that is not what I am talking about. If you're not very familiar with the idea of a mathematical proof, don't worry. It is not meant to be too hard. I would recommend reading the chapter on proofs in this book, to understand what kind of logical statements I am talking about, and the idea behind proofs generally.

Let's understand this by asking ourselves the following questions:

**What's the intent of the problem**(not in terms of ideas, but from a contest perspective)? Usually it is "solve this problem in xyz time complexity and/or abc memory complexity" or "solve this problem in these many queries". Constraints on the input and output, time limit and memory limit arise from this naturally.**What do we want the contestants to solve?**This corresponds to a precise and unambiguous statement of the problem, including the IO format, guarantees on special kinds of tests (if any) and so on. Unfortunately, there is no way to formally check that your Polygon materials are consistent with this problem statement, which is why it becomes necessary to ensure that the statements are such that your**solution**can be derived formally from the problem statement.**How do the contestants prove that they have solved the problem?**In other types of reasoning/math contests, you have to provide a proof of correctness to show that you have solved the problem. This is also the approach taken by academia. However, unless you force participants to submit a solution in proof languages like Coq or Lean, it is impossible (or very hard) to enforce that they have a proof of correctness for their arguments and time complexity of their algorithms. This is where tests come in. "Strong" tests mean that any code that comes from a solution that doesn't satisfy the intent of the problem fails to get accepted on those tests. Since the constraints are bounded, the set of all possible tests is finite, but still huge enough that adding them all is infeasible. So to combat that, a good practice is to add all possible tests for small constraints, some random max tests, and tests that break all* possible heuristic solutions.**How to verify that our testing procedure is correct?**Of course, if you feed in garbage data (i.e., data that doesn't satisfy the preconditions of a program), the output might be garbage. If there are inconsistencies in the tests, it is impossible to verify if the submitted solutions are correct (as in they solve the problem as stated). The same goes for tests that fall outside the constraints.

## Solutions to the issues — the guidelines

We will now list the guidelines that help make contests error-free. Most of these will have both an author and a tester part. Keep in mind that these guidelines will be along the lines of a security perspective.

Note that there are some things that we can completely verify in a mathematical sense by proofs. But there are also things that we can't be sure of. One of the reasons behind it is that the halting problem is undecidable. So, for those kinds of things, we enforce some kinds of limits on the runtime or the memory requirements, and since we also want to check the behavior of submitted solutions on large inputs, we miss out on a lot of tests due to the combinatorial explosion of the number of tests that happens for most problems.

### Provable things

Firstly, let's see what we can **prove** the correctness of. Keep in mind that a proof of consistency is always an "if and only if" argument.

#### Statement and solution

This is the part that comes even before setting the problem on Polygon or any other contest preparation system. Ensure that you have the following things:

- Formal statement
- Formal algorithm
- Formal proof of correctness of algorithm

None of this should have actual numbers in it (keep everything in terms of what is defined in the input, and don't think about the constraints for now). Don't even implement it in a programming language; use constructs in the programming language as well as standard algorithms whose proof of correctness is well-known as a black-box.

Your proof and algorithm should depend only on the statement and logic. Don't implicitly add in any other assumptions or make wrong assumptions in the proof (if you write the proof, it will feel off anyway, so it's quite hard to make a mistake like this if you follow this guideline).

Some notes:

- The algorithm and the proof of correctness are two
**different**things. Saying that "it can be easily seen that this greedy works" or "finding max flow in this graph gives the answer" is not a proof. - Write out everything mathematically, even if it is obvious to you. This is the foundation of the whole problem, and if there are issues in this, is there even a point to preparing the problem, or even thinking about it?
- I recommend reading this article and the links it refers to, to understand what I mean by a proof here. If you don't know how to structure things in the form of a proof, perhaps you are not yet ready for writing a problem for a contest yet. However, it is never too late to learn, and your idea behind the problem might be correct. A formal proof gives you a way to communicate things with others unambiguously, which is important since natural language is imprecise. Writing formal proofs is not hard, and apart from being a tool to communicate, it cements your understanding as well.
- It might be possible that there is a solution faster/better than your intended solution. Don't be afraid to use it as the intended solution (in fact, try to find as many solutions as possible).

Doing this step properly eliminates the possibility of having wrong statements or wrong intended solutions. It also avoids situations like this where the intended solution was "fast enough" on the tests but the problem in general didn't seem to be solvable fast enough. Incidentally, that blog is a good example of what can go wrong in a real-life setting.

#### Constraints and test validation

Setting constraints is the part which requires getting your hands dirty. Let's ignore that for now, and let's say we have some constraints.

How do you ensure that your tests are correct? Think of it as a competitive programming problem again, though a very simple one. This is where a validator comes in. With a validator, you validate the tests based on what the problem states. The reasoning behind proving the correctness of the validator should go like this:

We show that the validator accepts a test if and only if the test is valid according to the problem statement. For the if-direction, [do something]. Now for the only-if-direction, [do something].

I recommend keeping the numbers in this proof generic enough, as is done in most math proofs. We will plug in the numbers later, when we discuss the constraints. Having this proof handy is quite important. Sometimes, when preparing a problem, you end up changing the constraints later on, but forget to update the validator. If you keep this proof as a comment in the validator itself, and inspect all the contest materials before committing changes, you will immediately catch inconsistencies in the statement and the validator.

Also, validators are important when hacks are enabled. If your validator is too loose, you risk some AC solutions getting hacked. And if your validator is too tight, some hacks might be invalid, and there are some hidden conditions in the problem that the hacker might be able to guess, which makes the contest unfair. For example, this contest had such an issue.

For choosing constraints, you need to write some code to verify that the TL and ML are fine and so on. However, this is not something you can prove even in a handwavy sense (unless you know the assembly that your code generates and the time every instruction takes in being executed in that order, on the worst possible test), so we will keep it for later.

#### Checker

This corresponds to checking that the contestant's solution indeed outputs something that is valid, **according to the problem statement**.

Again, as in the validator, using a formal proof, ensure that the output satisfies all the conditions for it to be accepted as a valid output by the statement, and don't add any extra checks. This is important because there have been CF contests where the checker checked something much stronger about the structure of the solution. A very simple example is "print a graph that is connected". This doesn't mention that the graph can't have multiple edges or loops, but many people will just write a checker that will add these conditions anyway. This is another reason why adding links to precise definitions in the problem is a good thing.

Sometimes writing the checker is harder than solving the original problem itself, and there have been instances in certain CF contests where writing the checker was the second subtask of a problem! Keep this in mind while setting problems, since it is useless if you can't write checkers. Sometimes, it might make sense to ask for only a partial proof of solution from a contestant — like a certain aspect that you can compute only if you use some technique, or something that is weaker than asking for the output, but is a reasonable thing to ask for (this changes the problem itself, though, and hence it doesn't show up as an issue on the contestant-facing side of problem-setting).

### Things we have to manage heuristically

As we mentioned earlier, almost everything about setting constraints and creating tests is something that we have to manage heuristically. Both of these things go hand-in-hand, which makes the whole process quite iterative. Some obvious guidelines that help in these cases are given in the next section.

### Condensed list of guidelines for testers and authors

#### Common guidelines for setters and testers

- Read this blog and the links in this post that point to CF guidelines. Also read the links to issues in actual CF contests and think about what went wrong.
- Keep the contest preparation process interactive and discuss as many solutions as possible, as well as potential changes to problems.

#### Guidelines for setters

- Ensure that you know what testers are meant to help you with, and read guidelines meant for them as a starting point.
- Have a formal write-up describing the problem, algorithm and the proof, as described in the blog. The formal editorial for a problem should be ready
**before you prepare the problem**. - Keep the problem statements short, clean and unambiguous. Use lots of space in unavoidable situations. Avoid using examples in statements that unnecessarily obstruct the flow of the reader, and write the statement in a way that doesn't need you to rely on examples in the first place.
- Ensure that your model solution is readable and is provably correct (and it doesn't have any undefined behavior and so on). Preferably write it in a "safe" language, and turn on all debug flags while testing the main solution (this is something I have wanted on Polygon, but there's no support for that unfortunately).
- Write strict validators and checkers, and prove their correctness too (both in terms of semantics as well as formatting). When writing checkers, do
**NOT**assume that the jury solution is correct. Rather, check the correctness of both the jury solution as well as the participant's solution. - Have exhaustive test cases — ideally, use a bruteforce to generate all possible tests for some small sub-constraints to ensure correctness, corner cases, tricky large cases for TLE and random large cases. Ensure that you cover all possible special cases, and look for specific solutions that can give AC but do not work on the general case (like faulty greedy solutions and so on). Keep in mind unordered_map and unordered_set like peculiarities in the languages.
- Ensure that the time limits are not too tight, and they are at least 2-3 times the slowest AC solution you have. Try to keep constraints in a way that the most optimized unintended solution either doesn't pass (by a good margin — like twice the TL), or barely passes. In the latter case, you should be reconsidering whether you want this problem to appear on the contest at all.
- Before relying on the testers, try to test your own problems according to the tester guidelines.
- Maintain something like a spreadsheet for tracking status and feedback of problems, and current progress.

#### Guidelines for testers

- Ensure that you know what setters are meant to do, and read guidelines meant for them as a starting point.
- Ensure that the problem statements are water-tight and self-contained (not a lot of background knowledge necessary, and wherever necessary, links should be present) and provided examples clarify unavoidable ambiguities (which should ideally be non-existent).
- After testing, confirm that the authors' proofs are all correct. This is not limited to just proofs of correctness of the intended solution, but also for the checker, validator and so on. Just because you couldn't solve a problem in the virtual contest testing doesn't mean you mustn't care about the other problems.
- Try to hack the problem: submit heuristic solutions, both in the sense of potential WA as well as potential TLE. If they pass on all cases, try to come up with cases they can’t pass on, or prove that this heuristic is correct/fast enough. Such a situation is quite instructive whenever it arises (from experience, this happens usually when there is a greedy solution involved which doesn’t really need the full strength of the greedy decision).
- Try to write the slowest possible reasonable solutions that a contestant can come up with, with the same complexity as the author’s solution. Also try to come up with highly optimized brute-force solutions (using caching, bitsets, precomputation of answers and so on) and see if they can pass the test cases. Look for test cases that these can’t pass, but the official solution passes. In case the official solution and the optimized brute-force both have similar speeds, discuss what to do next with the coordinator and the author.
- If you come up with an alternative approach with a better or slightly worse complexity, notify the problem setter, and depending on what the intended difficulty of the problem is, discuss whether to make the problem harder/easier.
- Give detailed feedback on the quality of problem statements, problem content, editorials, solutions, test cases, perceived hardness of the contest, and anything else that will help improve the perceived quality of the contest.

### Handling other non-obvious issues.

At this point, there are the following issues we haven't addressed.

- Guessable problems.
- Problems that have appeared before.
- Unbalanced contests — difficulty-wise.
- Unbalanced contests — topic-wise.
- Long queue, system unavailability.

The first four issues are solved only by testing and being consciously aware of the fact that they need to be fixed.

If problems are guessable, it makes sense to ask contestants to construct something that shows that their guess is correct. For example, it might be easy to guess that the size of the answer for a problem is $$$n/2$$$. This is very guessable, and in this case, it is better to ask for the construction of the answer as well.

For avoiding problems that have appeared before, since there is currently no repository of all competitive programming (or math contest) problems that is searchable by context, the only way is to invite a lot of experienced testers with different backgrounds.

For avoiding unbalanced contests (both in terms of difficulty and topics), it is important to have a large variety of testers — people with various amounts of experience as well as various types of topics that they are strong at.

The last type of issue is something unique to online contests that are on a large scale — for example, CF contests. This is the main reason why pretests exist in the first case. To avoid long queues, usually there are at most 2-3 pretests for problems like A and B. To make strong pretests that are also minimal, there are two ways of choosing tests (which can be combined too):

- Choose an exhaustive test (on smaller constraints), and a random max-test.
- Pick all possible tests that fail solutions that you encountered in testing.

This is one of the reasons why the first couple of problems are multi-test in nature.

## Conclusion

If there's anything I missed out, please let me know, since it's been quite some time since I last coordinated a contest. Feel free to discuss on various aspects of problem-setting in this context, and I would request coordinators to also share their to-do list when they start coordinating a contest, right from the very beginning of the review stage.

orz blog

Coincidence? I think not

I think that your blogs produce the opposite effect of what you are trying to achieve (although my skill for understanding people is negative, so take that with a grain of salt).

You make it seem that proving is something hard that is only available to people with special education in math. You insist on using the phrase "formal proof" which seems to suggest writing everything in symbols, maybe even starting proving from axioms. You mention Coq, and as a person who had a course in Coq and remembers how hard it is to prove "obvious" stuff like the commutativity of addition for integers, I can say that it is not good for supporting your position.

From here I will address an imaginary

~~flat-earther~~proof-denier.Mathematical proof is an argument so convincing, that you are prepared to use it to convince other people.Imagine a person who doesn't believe in your claims (imagine me). But they don't want to mess with you, they want to get convinced, they just don't want to trust your word. Can they find a hole in your arguments? If not, it is strict.It's ok to use natural language in proofs. It's ok to reuse known theorems in proofs. It's ok to cut some corners (but only after you got some experience). It's ok to say that something is obvious.

It's not ok to believe in some greedy and justify it by saying "it's clear that this is the worst case and we are doing fine in it". Because there are very few situations when you can actually understand what is the worst case for your greedy.

Here is the example of "your average

it's the worst caselogic": A decreasing array is clearly the hardest to sort in increasing order, it even has the maximum number of inversions. A decreasing array is sorted by reversing the array. Therefore, reversing the array sorts the array in $$$O(n)$$$.Ridiculous? But how is it different from this?

There shouldn't be any assumptions in your proof, apart from the ones given in the statement. There are some quasi-exceptions to this, like if the problem has random input, it's ok to say that some property will hold because of randomness and then just check that it actually holds by running the code a lot of times. Or something similar. Running an exhaustive search is a valid method of proving. Running your greedy on some tests you believe to be the worst is not.

Read Everule's blog and comment section. Try to come up with proofs yourself. You will get better with practice.

Thanks for the comment! Your reply to an imaginary proof-denier is pretty much what I believe in as well, and it's good practice to always prove things (which was one of the points I tried to make in this blog by the way) and have a certain amount of rigor in how one approaches problem-solving, even as early as the intuiting part of the problem-solving process.

Regarding the point about using "formal proofs" — I am painfully aware of the kinds of images it conjures up. I mentioned Coq and Lean as proof software that could be used to verify proofs, but did not claim that a proof

musthave that level of rigor. Thanks for pointing that out, and I will try to clear up any misunderstanding that my use of the word "formal" might have brought about.I used the word "formal" here so extensively just because people on Codeforces seem to call anything they find reasonable a "proof" (an example being the comment you linked to). I wanted to stress the fact that it should be logically sound — in contrast with arguments that people don't even realize have loopholes (like the proof in the linked comment). In fact, even in many editorials while explaining some greedy ideas, people just end up saying "hey, this thing works, and intuition checks out", and people relatively new to competitive programming end up thinking "oh, so this is how a proof works — better think this way the next time I solve a problem".

It's just because of the unfortunate naming used by people that I had to resort to using the word "formal", which is otherwise exclusively used for actual formal proofs that can potentially be unreadable for the uninitiated.

I did link to this webpage, which has some references to how a proof should look like. In the last book they refer to, I believe the concept of a proof is built from the ground up, and it has a simple enough idea of a proof. Again, as I said in the blog too, coming up with a proof once you have basic (correct) intuition pinned down is

not hard, and writing proofs gives you unshakeable confidence that what you thought was indeed correct. There is no such thing as a "wrong" "proof" in my opinion — if it is wrong, it is not a proof at all, and logical inconsistencies will eventually show up once you have enough practice. In the way I learned math, I didn't even consider making a wrong argument (of course, there were exceptions, I am not perfect). Maybe I wasn't being creative enough, but I found myself constantly arguing with myself, trying to convince myself that my argument is correct, and completely ignoring the arguments that didn't make sense. In doing so, I found myself going back to a very small set of reasoning methods, which are sort of the foundation of how I reason now. I'm aware that unlearning the pseudo-proof methods that people use right now might be hard for them, and it will take some time to digest the changes (especially because proofs sometimes change how you reason itself, which is a pretty huge change for someone if you ask me).I also think people should read Everule's blog that you linked. I had a long discussion with him regarding the contents of the blog, and I'd say it mirrors my thoughts on the matter as well.

Yes, that's why I wrote the first part of the comment. I understand that you are working for a good cause, but I feel like you are presenting your position in an unflattering light, thus harming the position in eyes of the reading public. So I wanted to nudge you into revising it :)

Yes, the language is flawed. The things people call proof sometimes drive me crazy. But we need to break the wall of supposed superiority and scary-sounding words.

I actually don't like the link to AoPS you provided, because the text on the page is reinforcing the image that proofs should be maximally pedantic and they are for mathematicians only:

Then they mention proof by construction and proof by contradiction, which are much better, but they don't say that the two-column format is unnecessarily rigid and don't expand on two actually nice proof types.

Tangential personal storySomewhere around 7-8th year in school (11th year is senior year, for context) I read a book called "Geometry for dummies" or smth, which introduced two-column proofs to me. I started writing my proofs on geometry lessons in the very strict two-column style. Simple problems on 3-4 lines took me a couple of pages. After a week the teacher asked me to stop.

Even more, in the next paragraph they say

directly comparing informal proofs to two-column proofs for some reason. Then

Which reads as "we,

realmathematicians, do not stoop to your informal level, it is for children". And since directly opposing informal proofs we have two-column proofs, it implies that two-column proofs are the right way to go.They don't say anything negative about two-column proofs. You need to use hyperlinks

twiceto get here and seeThen there are links to articles and a book you can buy on Amazon (thanks?). I opened the links, but the articles are pretty long, even I haven't read them yet (although they are still open in my browser and I want to read them later), the person who doesn't believe in proofs will just close it right away even if they decided to click, which is already much more effort than we should expect.

Regarding your first point: I sometimes intentionally post slightly controversial (but honest) opinions just to get the ball rolling, and I agree that the main issue is people being scared of proofs, so that wasn't really the right approach here. Maybe I'll make up for it by adding a link to the blog you mentioned, and some of the things on proof writing that I myself read (and not second-hand advice, like the AoPS page).

I should have been more alert in pursuing an explanation of what proofs are, that's my bad. I just linked the page because (I thought) I saw an explanation of how usual proofs are different from formal proofs, and the reference had a nice book.

The book mentioned is available for free online (on a course webpage, so I think it should be fine), and I highly recommend going through at least the chapter on proofs, if you don't want to learn about everything else.

I tend to find myself trying to wrap my head around how some people can so confidently assert some claims that are completely false, and how they continue to believe in flawed reasoning even after it is pointed out to them. Maybe it's because I have a math Olympiad background, which had a very strong focus on proofs, compared to competitive programming, where people think programming is more important. I tried finding resources on proofs that I used, but unfortunately, I couldn't find any. Perhaps practice with mathematical arguments is the only thing that teaches you to reason confidently.

Very nice proof that sorting is $$$O(n)$$$

Your blog has "why problemsetting trends need to change" in the title but I don't understand what your blog has to do with this. As far as I can tell, your blog is mostly about "things that you should do when preparing a problem for a contest" but doesn't discuss what current trends are, why current trends are bad, or where you think they should go.

Thanks for pointing this out. Honestly, I forgot about that for a bit, because a lot of what I suggested was the opposite of the current trends, and the examples I linked to (that are scattered throughout the text) tell why they need to change.

For instance, there's little rigor in how problems are discussed during the testing phase as well as the proposing phase (especially greedy problems) — people are expected to write only very short proofs, and some negligence creeps in automatically due to the lack of rigor; and editorials are seen as something secondary. The blog claims that the issue lies partly in that and partly in not seeing testing from a security-based perspective, and it is only by making changes to this perspective on problem-setting that mistakes can be avoided.

More specifically:

How do we really "prove" whether a heuristic works correctly and fast enough? If it could be proven to be correct, then should it be called a heuristic, rather than an "algorithm"? Say, someone has a heuristic improvement to a problem. Maybe, someone claims that using the infamous "dancing links" technique on a sudoku solver can improve the asymptotic time complexity over the usual backtracking solution. Or probably, they claim that the Simplex algorithm is faster than Interior-point methods in practice, but the worst-case analysis tells them otherwise. How will we prove/disprove these kind of claims?

One example is this: in a game theory problem in a contest I coordinated, there was a DP to find winning and losing states. The intended constraints were such that a linear/linearithmic solution passes. However, it turned out that you could tweak the order of computing some things in a brute-force solution to get an $$$O(n \log n)$$$ algorithm, by just doing $$$O(n / i)$$$ operations for the $$$i$$$-th step.

Tweaking just the order is what I call a heuristic here (it's quite simple to mindlessly guess that it will work), and often there are situations where these kinds of heuristics come up and have provably good properties.

While it is not proven, it is a heuristic. When it is proven, it is an algorithm.

Similar to how a proven hypothesis is a theorem.