MikeMirzayanov's blog

By MikeMirzayanov, 6 years ago, In English

Hello, friends!

I want to discuss separately the negative feedback on the round 505.

I understand, that many participants are disappointed by solutions, which failed on system tests. In fact, the pretests in problems B, D and E turned out to be weak. The problem C also didn’t gone too smoothly, but I don’t see nothing critical.

Surely, this was a flaw in a work of author and coordinator. Unfortunately, such situations sometimes arise and it is difficult to avoid them at all. For these problems, the number of pretests and their type was not looking too weak.

Problem B: 9 pretests, there are small and large answers, two tests with answer -1, there are pretests with n = 1 and n = 2, there are four pretests with n = 150000.

Problem D: 14 pretests, among them manual tests and four different generators, few pretests with n = 700, majority of answers is ‘Yes’, but there are ‘No’ as well. In my opinion, too little pretests with ‘No’.

Problem E: 14 pretests. Yes, this problem on VK cup finals contained 10 pretests and caused many systests fails for participants. I have added 4 more tests to pretests from tests, which caused system tests fails for onsite participants. I was very surprised to see, that there were still so many fails after systests.

Summing up, the pretests turned out to be incomplete, but it is hard to say, that it was obvious defect by author or coordinator. Probably it is a combined effect from the problem specifics and the lack of experience of cdkrot as a coordinator.

I haven’t examined all the problems thoroughly, but still round seemed interesting to me. There were no serious fails with statements, bugs in tests and solutions. The system was also working smoothly, without large queue.

Please explain your negative feedback about the round. It will be very valuable to read a reasoned opinion from experienced participants.

Thanks, MikeMirzayanov

  • Vote: I like it
  • +330
  • Vote: I do not like it

»
6 years ago, # |
Rev. 2   Vote: I like it +71 Vote: I do not like it

I think that for most problems, coming up with a correct algorithm is 80% of the work and, in my opinion, is what differentiates skilled coders from not-so-skilled ones. So obviously coming up with the right algorithm but then forgetting a special case, and then failing systests pretty infuriating, because you get the same score as someone who doesn't know how to solve the problem at all (0 points) and you'll end up losing rating over it.

That being said, of course it is acceptable that some problems have weak pretests, otherwise hacking wouldn't be a thing. But people react strongly when the pretests are so weak that they allow obviously flawed algorithms from passing, or when too many problems have weak pretests, because then you might see contestants watch their score plummet from 3000 to 300 in systests or contestants massively rising in rating because they get 10 successful hacks (and since it depends on the room you're placed, the score you get from hacks depends heavily on luck).

So what I think is the biggest issue is not the weak pretests per se, but the scale in which it has happened in this round. A possible solution might be to always include a few problems that do not have tricky cases to avoid all problems being affected. That being said, it is true that I find the problems in this round pretty good and the difficulty was well-balanced, so it's not all bad.

  • »
    »
    6 years ago, # ^ |
      Vote: I like it +9 Vote: I do not like it

    That being said, of course it is acceptable that some problems have weak pretests, otherwise hacking wouldn't be a thing.

    Well, are hacks even a thing? I see them just as a "happy" solution to the problem highlighted in kostka's comment: not being possible to give full feedback because of the high number of contestants.

    The thing is that the majority of other well established contests (ICPC, IOI) don't care too much if you fail because of a trivial border case, they just add some penalty.

    • »
      »
      »
      6 years ago, # ^ |
        Vote: I like it +19 Vote: I do not like it

      That's a very good point you make I hadn't considered while writing my reply. In any case, the message I was trying to convey is that it's not a big deal if only one problem has weak pretests and it only affects a handful of people.

      In any case, what sets apart well established contests from Codeforces is that you only participate maybe twice in your lifetime in IOI or ICPC whereas in Codeforces you can participate hundreds of times, so if a few rounds have weak pretests your rating won't be affected that much over time, whereas a round of IOI or ICPC with bad testcases will mess up the results for certain.

      In essence, Codeforces can afford some inaccuracy that ICPC or IOI cannot. That, of course, doesn't mean it is acceptable to have a disaster like #505 or others in the past, but one bad problem every once in a while isn't the end of the world.

      • »
        »
        »
        »
        6 years ago, # ^ |
        Rev. 2   Vote: I like it +62 Vote: I do not like it

        What you don't understand is that not giving full feedback during a contest is a feature, not a bug. It is what sets Codeforces apart from other "well established contests" such as IOI or ICPC and what makes it so interesting.

        Therefore round with weak pretests should be considered not a "disaster" but rather a typical round — vice versa, problems with very strong pretests are not suitable for Codeforces but rather should be saved for other "well established" contests.

»
6 years ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

I think the worst part of the pretests was B missing this case:

1
1000000007 1000000007

Probably nearly everyone who failed B to systests failed to something like this. And since this is even kind of a special case in the intended solution, I think it's odd that it was missing.

  • »
    »
    6 years ago, # ^ |
    Rev. 2   Vote: I like it +51 Vote: I do not like it

    I do not see how this is special case

    • »
      »
      »
      6 years ago, # ^ |
        Vote: I like it +6 Vote: I do not like it

      A lot of codes output (1e9 + 7) * (1e9 + 7) (gcd of products of all pairs)

      For me, the factorization loop ran while (i*i < val), not while(i*i <= val), so my solution would have outputted NO.

  • »
    »
    6 years ago, # ^ |
      Vote: I like it +7 Vote: I do not like it

    This issue is easy to avoid if you don't multiply the two numbers to begin with.

»
6 years ago, # |
  Vote: I like it +57 Vote: I do not like it

The issue with weak pretests in this round was that generally the pretests leave out an edge case, whereas in today's contest, it failed to check TLE solutions in B or even the logic in case of D.

For instance, my solution for D did not even check for the tree being binary. While implementing, I just verified whether a tree could be constructed and it still cleared the pretests.

The scale of weak pretests in this contest lead to a lot of hacking too (with many missed opportunities), and while hacking is an integral part of the contest system, the scale at which hacking happened makes the contest frustrating.

I have enjoyed taking part in almost every CodeForces round, but contests like this one take away the fun.

»
6 years ago, # |
  Vote: I like it +4 Vote: I do not like it

We all should define what pre-tests are. Otherwise it's a meaningless controversy. What's the point of system testing if every case will be covered by pre-tests?

Also, maybe it is better to notify people about potential quallity of pre-tests in advance. i.e. in round announcement state that tests cover basic cases only or some corner cases too, etc.

  • »
    »
    6 years ago, # ^ |
    Rev. 9   Vote: I like it +39 Vote: I do not like it

    I agree that we should redefine pretests, but adding "quality of pretssts" sounds a bit ridiculous. I hope that problemsettera try their best to have strong (pre)tests.

    Ok, so why do we even have pretests? People are not really used to not having full feedback nowadays. On the ICPC-like contests you have full feedback almost immediately. Same for IOI-like contests (maybe without ranking). I think that the biggest problem is the amount of participants in the regular Codeforces round. There are simply too many submissions during the contest to give everyone full feedback. On the other hand, people don't have time to test their solution, the Codeforces round is really short, compared to the other contests. Generally, I used to trust pretests. I hoped that there is a small percent of submissions that would fail on system tests, after passing pretests.

    The problem(?) or the cause of the outrage is the format of Codeforces rounds. I am not saying that we should change it right away, but maybe we should rethink it. The biggest problem (in my opinion) is that after failing on system tests you cannot resubmit your solution. Of course there's a possibility that someone might hack your solution, but it is quite rare. If you fail, and you were only one stupid corner case away from the correct solution, you feel quite bad. You got the same score as someone who didn't even submit this solution or thought about it.

    I don't have a good solution, but maybe someone will.

    • »
      »
      »
      6 years ago, # ^ |
        Vote: I like it +29 Vote: I do not like it

      There are simply too many submissions during the contest to give everyone full feedback.

      I would agree with you, but in fact, during codeforces rounds the system also runs shadow system tests, so that the moderators of the contest (authors, testers, etc.) can view for each submission its verdict when judged on the system tests. So I'm wondering, is there really a need in pretests if system tests are already being done on the fly?

      It could be that I misunderstood (and if so, then I want a clarification), and that those system tests are only run when, for instance, the queue is short. But still, I think simply doubling the number of pretests will work out just as fast as performing system tests during the round (and will also give more freedom to put more countertests in pretests).

      • »
        »
        »
        »
        6 years ago, # ^ |
        Rev. 2   Vote: I like it +13 Vote: I do not like it

        It was my understanding (though I might be mistaken) that these shadow system tests only take place on a small percentage of total submissions. I would be interested to know more about these tests however, because if they only occur on 1-3% of "pretest passed" submissions then the reduced load is understandable but we do not know what proportion of submissions get judged fully.

        I do remember a couple months ago when ksun48 took part in a post-contest stream for the round which he authored, he implied that the round coordinators already knew the final standings (at the top of the leader board at least). Do the shadow tests favor those contestants that did the best? Or does every single submission get run on the shadow tests? Bottom line, I agree that the details of how these shadow system tests should be clarified.

        • »
          »
          »
          »
          »
          6 years ago, # ^ |
            Vote: I like it +29 Vote: I do not like it

          I was a problemsetter and so I can tell about shadow testing. It tests all the submissions. But there can be the situation (often at the beginning of the contest) that small percentage of submissions that passed pretests were tested on the full tests — just because submissions come so often. By the end of the contest people submit more rarely and the coverage by shadow testing can reach 100%. So, the authors can know the results before system testing.

        • »
          »
          »
          »
          »
          6 years ago, # ^ |
            Vote: I like it +23 Vote: I do not like it

          It is not exactly true, that the round authors know the exact standings before the end of contest.

          But the shadow testing exists (more in chronological order rather than for top), for example sometimes this allows authors spot some issues like bad solutions passing and perform counter-measures before the results are final.

          Also, it was not me who was on a stream (this round was coauthored with 300iq, and he was the one on the stream).

          • »
            »
            »
            »
            »
            »
            6 years ago, # ^ |
              Vote: I like it +10 Vote: I do not like it

            Regarding the twitch stream, I was indeed mistaken. I was thinking of Round #492 which was co-authored by ksun48 in which he appeared on stream afterwards to discuss to contest. Apologies for the unwarranted tag

»
6 years ago, # |
  Vote: I like it +47 Vote: I do not like it

E: "Weak" pretests are a feature simply because there's a variety of solutions and there's no way to make pretests that are good for all of them. Also, since the grid is very sparsely populated for all but the most trivial inputs, it's very easy to find a good solution on any single test by coincidence and rather easy to make a solution that fails for a specific type of tests. You'd need like 50 pretests (or choose 15 very very carefully) to claim they're strong. Knowing which problems are easy to make a hard-to-spot (e.g. by submitting) mistake in is an important skill and applicable here.

D: Pretests were weak. There's no question about it. I checked how fast my solution runs on max. N, random sequence, and got over 500 ms; pretests gave max. 77 ms, which is very misleading. I'd definitely include that random test if I were making pretests, if only because it's the easiest, and that's why I'm also surprised it didn't occur in pretests.

B: There's some freedom in what to print, which means good tests are harder to make — similar to E — but while I didn't experience any issues (in part because I had plenty of time to think while cursing my Internet disconnection), this... doesn't seem like a problem where a lot of wrong solutions should pass pretests — unlike E. If people complain, it's probably true that there wasn't enough work put into what should be in pretests. 9 isn't much, especially if it includes samples.

Note that yes/no problems are bad because weak pretests and weak systests are a possibility. There really should have been something extra in D, e.g. counting possible roots.

»
6 years ago, # |
  Vote: I like it +52 Vote: I do not like it

I don't see the weak pretests as the main problem, am OK with it as it sometimes helps you be more accurate, and work hard to test your solution locally before submitting, But my main concern is with people who get high ratings just by hacks, there has been a lot of talks about hacks recently, and I think this should change, My suggestion is to use Dynamic scoring for hacks, It isn't fair for a problem with lots of hacks to give you same as a problem with < 10 hacks.

»
6 years ago, # |
  Vote: I like it +35 Vote: I do not like it

The pretests were sure weak but I think the round received unreasonably high negative feedback. I myself got hacked on B but I won`t blame the authors for that matter, as pretest were checking correctness of the algorithm (and not the worst case complexity, which user should be aware of, before submitting).

The problems were well balanced as well. Thank you for this Coding Weekend!

»
6 years ago, # |
Rev. 3   Vote: I like it +64 Vote: I do not like it

Actually, the problems are really good (thanks a lot!) and pretests are still acceptable although they can be better by make them a bit stronger. The issue that is bugging me (and maybe other) is that the constraint -- on Problem D -- is misleading and too strict.

N = 700 actually suggests for a solution with O(N2), O(N2logN), or lower; and of course this can be easily argued. But setting it for a O(N3) solution is a critical flaw. 7003 is barely on the time limit constraint as someone cannot deduct for sure whether his solution got AC or TLE without submitting it. The situation gets worse if multiple solutions on the same complexity got different verdicts (ACs and TLEs).

Therefore, I'd like to suggest that the constraint should be made clear and not strict.

  • If the problem setter insists to make N = 700 works for O(N3) solution, test it with a pure O(N3) solution without any optimization. Check the time, and it's good option to set the time limit twice of it. Additionally, can check it too with a slow language like Java.

  • If not, change the N. N = 300 is a reasonable choice as O(N4) will still be killed. Setting higher N but with higher time limit is also another option.

  • Also, for a problem with strict constraints like this, it's suggested to include large testcase as a pretest so it doesn't really mislead the participants.

  • »
    »
    6 years ago, # ^ |
    Rev. 2   Vote: I like it +16 Vote: I do not like it

    O(N3) doesn't imply N3 operations with integers. In this problem there are calculations of dp, which seems quite reasonable for this solution to work in time.

    What is pure O(N3) solution then? The one that makes other operations which don't make any sense?

    • »
      »
      »
      6 years ago, # ^ |
      Rev. 2   Vote: I like it +5 Vote: I do not like it

      Well, not literally N3 actually, but as you can see indeed N3 / 6 is a reasonable solution to test. Unfortunately, my solution got TLE on that complexity.

      By pure O(N3) I mean without a silly optimization, like using bitsets, or && vs & vs min-max (I recalled someone got AC by just changing & to &&), or some trick with break, etc. Just do as it is.

      • »
        »
        »
        »
        6 years ago, # ^ |
          Vote: I like it +18 Vote: I do not like it

        OK, in overall I agree that TL is too tight, for example my solution 41841784 which I supposed wuold be fast runs in almost half TL. Btw I used bitsets, but only for convenient code — otherwise I would have to store dp for last two segments lengths, which is less enjoyable.

    • »
      »
      »
      6 years ago, # ^ |
      Rev. 2   Vote: I like it +21 Vote: I do not like it

      I tried to optimise my in-contest solution for cache and stuff. Now it does the following in assembly (compiled with --std=c++17 -O2):

      _Z9solve_lupiiPA702_KcPA702_cS1_:
      .LFB45:
      	.cfi_startproc
      	leal	1(%rdi), %r9d
      	cmpl	%esi, %r9d
      	jge	.L7
      	movslq	%esi, %r10
      	movslq	%edi, %rax
      	subl	$2, %esi
      	imulq	$702, %r10, %r11
      	subl	%edi, %esi
      	movslq	%r9d, %rdi
      	imulq	$702, %rax, %rax
      	addq	$1, %rsi
      	leaq	(%r11,%rdi), %r9
      	addq	%rax, %rcx
      	xorl	%eax, %eax
      	addq	%r9, %rdx
      	addq	%rcx, %rdi
      	addq	%r9, %r8
      	.p2align 4,,10
      	.p2align 3
      .L10:
      	cmpb	$0, (%rdx,%rax)
      	je	.L9
      	cmpb	$0, (%rdi,%rax)
      	je	.L9
      	cmpb	$0, (%r8,%rax)
      	jne	.L12
      .L9:
      	addq	$1, %rax
      	cmpq	%rsi, %rax
      	jne	.L10
      .L7:
      	rep ret
      	.p2align 4,,10
      	.p2align 3
      .L12:
      	movb	$1, (%rcx,%r10)
      	ret
      	.cfi_endproc
      
      

      That's one of the two functions that form the main logic of the program. Only the .L9 and .L10 labels contain what's executed in the loop, so that's 2x 9 quite cheap instructions in a loop N^3/6 times. It's also cache-efficient. Still, at worst one million, realistically maybe half a million assembly instructions in total and it runs 300ms on a test with a random sequence.

      If that's the best I can do without bitsets, using fairly advanced knowledge for algorithmic competitions (cache) and it still takes at least 1/3 of the TL, that's pretty damn tight. Doable, of course, but pretty damn tight. My actual in-contest submission is 2x slower and it's easy to be 3x slower and get TLE.

      • »
        »
        »
        »
        6 years ago, # ^ |
          Vote: I like it +33 Vote: I do not like it

        Not having any "advanced knowledge", I wrote simple recursive solution and it works 78ms — 41910650. Did you consider this approach before claiming that the TL is "pretty damn tight"?

        • »
          »
          »
          »
          »
          6 years ago, # ^ |
            Vote: I like it +3 Vote: I do not like it

          What's the maximum number of instructions (assembly) executed for the main (deepest) loop in total?

        • »
          »
          »
          »
          »
          6 years ago, # ^ |
            Vote: I like it +5 Vote: I do not like it

          Wow, really ? You have the break inside! You can't assume, that this break will catch too often, when the solution really exists, so let's be honest and remove it, ok ? You are not so fast now : click

      • »
        »
        »
        »
        6 years ago, # ^ |
        Rev. 2   Vote: I like it +20 Vote: I do not like it

        I agree that the TL was too tight for CF contest (it could be OK for ICPC). But here is how O(N3) without recursion and breaks could run 200ms on max test. That's actually my submission from the contest, but without template and prettified a bit. I haven't analyzed assembly, but the main feature, I guess, is that I calculate both dp in a single loop and they are stored in single array dp[N][N][2].

»
6 years ago, # |
  Vote: I like it +31 Vote: I do not like it

I enjoyed the tasks. They seemed to be interesting, balanced, and there were no big issues with the English problem statements. The pretests were on the weak side, but I do not see this as a problem. Honestly, my only frustration was that N<=700 was a strange bound for problem D. I don't think this is a big problem, since an experienced coder could tell it would probably run in time with good optimisation.

The problem seems to be expectation. Generally speaking, most other rounds have had stronger pretests. Competitors have therefore come to expect stronger pretests. I believe the negative feedback comes from this expectation not being met.

My opinion is that better defined guidelines about pretests could help. Competitors will then know what to expect, and problem setters can follow these guidelines to ensure their round meets these expectations. I don't know if stronger or weaker pretests are better, I leave that discussion to more knowledgeable competitors.

»
6 years ago, # |
  Vote: I like it +3 Vote: I do not like it

For B, there was weakness in testing the logic of the solution also, my first solution's logic (haked) was so far from the correct logic, see it and laugh XD. 41842360

Anyway, thanks for your efforts in this and all contests.

»
6 years ago, # |
Rev. 3   Vote: I like it -10 Vote: I do not like it

Problem D: I don't understand why the answer of this test is "No". Can anyone explain for me? Thanks.

Test 54: 4 23 247 299 703

I think it can be: 23 | 299 | 247 | 703

  • »
    »
    6 years ago, # ^ |
      Vote: I like it +3 Vote: I do not like it

    I don't really understand your notation, but I assume that 299 is the right child of 23, 247 is the left child of 299 and 703 is the right child of 247. It won't be a binary search tree because 703 is in 299's left subtree but it should be in the right one.

»
6 years ago, # |
Rev. 2   Vote: I like it -12 Vote: I do not like it

I think having pretests with some missing corner case is ok but pretests should not be so weak that it fails to check TLE solutions. In Problem B and Problem C(surprisingly) there were solutions which failed due to TLE in systest. I think there should be few strong pretest that checks for TLE solutions.

»
6 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Just a sidepoint (not sure it's relevant here, but it seems related) : on some problems (like the B), it was not possible to submit any hack that satisfied the input constraint, because of the limitation in the input size to 256Kb.

One example is test 83, whose input it more than 10 times bigger than what can be submitted as a hack. I believe putting the same constraints on the tests and on the hack could increase the possibilities for participants to strengthen the pretests themselves through more thorough hacking. It would also seem more fair for people attempting to hack.

Besides this, I believe philosophically, having weak pretests is another form of competitive programming, cause programming is as much "creating algorithm" than "implementing algorithm without errors". The beauty of some problems are in the details, and it is really fun to have contests like yesterday's.

»
6 years ago, # |
  Vote: I like it -29 Vote: I do not like it

My code in B was hacked by this test:

1

1000000007 1000000007

The pretests in B were not good enough. However, in my opinion, the pretests in A were good with pretest 4 (length = 1) in pretests that avoid so many hacks.

  • »
    »
    6 years ago, # ^ |
      Vote: I like it +21 Vote: I do not like it

    I think that case is simple enough for you to check yourself, no?

»
6 years ago, # |
  Vote: I like it +10 Vote: I do not like it

"It's your fault to make mistakes while coding, don't ever blame anything else"

I always think about it whenever I failed.

»
6 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Pretests in C problem were very weak....I submitted an O(n ^ 2) solution and it passed the pretests and Another Wrong Answer Solutions passed Pretests