eduardische's blog

By eduardische, history, 3 years ago, In English,

During my competition history I've collected a vast amount of different fail stories. And those usually come in two flavours. You either get an aftertaste of a personal failure or you get a feeling that you were helped by organisers in your adventure. And it's always helpful to try and learn something from these failures. The first ones are kind of personal, and only affect you or your team, so they are not that interesting to talk about. The second ones however are more important. First of all, this is a direct feedback to the organisers, in hopes that it might make future contest better. And then you usually try to use that feedback yourself when you move on the organisers side in an attempt to make sure that you avoid setting all those pitfalls that you were annoyed about when competing. So here I'm going to try and describe two types of pitfalls that in my opinion the organisers can deliberately or accidentally set that helps teams to embark on a wonderful journey to fail-land a lot easier. This was originally intended to be a comment about Bubble Cup problem set, but since I've realised that the problemset contained great examples for both of these pitfalls, I've decided to post it separately to enable discussion.

"We weren't even bothered to try and submit something as stupid as this". Once in a while you get in a situation where everyone is solving one problem but you have no ideas on how to proceed. This usually happens if you manage to overcomplicate this task. However, sometimes these happen when you attempt to solve the task rather than just getting AC. Some cases are reasonably sensible – if you want to prove every single bit of your program before submitting it, you might be doing it wrong – sometimes it is sensible to try and assume something which looks sensible. However, sometimes this goes too far. Imagine you have a greedy or random solution, which certainly doesn't look like it should work and you sure as hell don't have any proof for it, but it actually works, either because there is a complicated proof or just because tests were weak/it turned out to be difficult to construct a test breaking the solution. And this is a type of tasks which I personally highly dislike. Why? You often get penalised for solving the task. If you just happen to try something "maybe this somehow will get AC" early you get a huge advantage in time and manhours for the rest of the contest even if you end up solving it.

So, let's go back to Bubble Cup. Take a look at the problem H. This had a much bigger influence on the onsite contest than on the mirror as it was the fourth problem after C+D+E. From a personal experience, with about 1.5 hours to go about a third of teams had this. At this point we looked at the scoreboard, and seen that a couple of strong teams we've identified before the competition didn't have it either. At this stage we stopped solving this problem and decided to do some random-greedy approach, which felt like shouldn't work but let's try it out and maybe put some more heuristics later. Of course it worked on the first try after fixing the bugs.

So here is the takeaway. I'm sure the solution to this problem is nice. But if a lot of things passes without any thought involved whatsoever, then it is not a good task in my opinion as it enables a lottery of whoever was lucky enough to submit something to see if this would get AC. Of course you can better at this lottery with more experience as a contestant, but I don't believe this should be an excuse for doing this sort of task. And of course, sometimes there are some solutions not imagined by the authors, but in this particular problem I got the feeling that anything more or less sensible passes.

"Entering the door through the cat flap". Imagine you are drunk. So you get home and try to enter the house. Sure, inserting the key and opening a door might be a challenge, but eventually you get in. However, instead you can try to get in through the cat flap. Sure, sometimes you will actually get in but most of the time you'll spend a lot of time before failing and realising that you're doing something wrong. Something similar can happen in a programming contest. You would try to do some different approach (generally worse asymptotically) and then spend some time to think about how to actually fit it inside the limits. Sometimes you'll succeed but that's not always the case. Often it is obvious that some solutions would ultimately fail, but this is not always the case. Sometimes you get an impression that you're along the right way but you just need a bit of optimisation to make it across the line and this in my opinion isn't a great situation.

Again, back to Bubble Cup. Let's take a look at the problem B. You could take a quadratic dynamic programming with linear transition and optimise it to amortised constant transition. 100M operations should pass in 1 second, right? However, we need linear memory in this approach, so it doesn't really fits. However, we use previous values in a specific pattern, so there are a lot of things to optimise, including trading memory for time, etc. So that's exactly what we were doing during the contest and at least one other team onsite after the contest had a question of how to fit this in the memory. So I was really surprised to learn that the intended solution was actually O(log N). So why was N set to only 100M? According to the authors it was to avoid giving out a hint. Sure, I could imagine that this is sometimes an issue, so fair enough. However, this left a non-obscure way of approaching this task on the boundary "this looks like it might pass but actually is unlikely", which is what I believe should be attempted to be eliminated by the organisers. The smallest accepted to attempts ratio in the mirror in my problem hints that this might have been an issue in the mirror as well. More often than not this happens when the organisers miss one particular approach. For example, last NWERC there was a problem with O(N log N) solution but if you go the other way about optimising it with segment trees you'd get O(N log^2 N), which looked like could pass with some optimisations, but actually did not. After the competitions and speaking with the organisers I got the impression that they would be fine with accepting this approach but they did not though about it.

So here is the takeaway. Try not to leave the cat flaps. Sometimes it's hard and you will miss some, but I believe an attempt should be made to think about alternative approaches and then make sure that you accept them or that it is reasonably obvious that you will not accept them, so that your contestants don't get stuck in the cat flap.

So, now I guess I'm open to discussions. These two are my personal points which I try to make sure does not occur whenever I am involved in organising a programming competition, but perhaps you might have some more or you think that some of the problems I've identified might not be actually problems.

And in the end, I'd like to express my thanks to the Bubble Cup organisers. The whole onsite event was wonderful and I hope that this feedback can be used for positive purposes. In general it saddens me when it is obvious that a huge amount of effort is put into the organisation and even a single problem can leave a lot of teams with some negative experience due to the issues mentioned above, so I believe it's important to address those.

 
 
 
 
  • Vote: I like it  
  • +202
  • Vote: I do not like it  

»
3 years ago, # |
  Vote: I like it +12 Vote: I do not like it

Good post. It's nice to hear feedback. I was an organizer of some previous Bubble Cups (but I wasn't really involved in this one) and other competitions, so here's my view on the two points: 1. Test cases should be good. They should be designed so that weak solutions do not pass. Everyone should agree on this. On the other hand, I understand this task is not typical. I personally enjoy those kind of tasks. They require relying on some intuition and they are different than the standard 'identify the type of problem and apply some modification of the known idea to solve it' tasks. Yes, having a lot of experience and knowledge may be a disadvantage here actually, because you are less inclined to 'think outside of the box' of standard programming contest tasks. But you come up with these kind of problems in real world programming a lot. As I mentioned, I was not involved in this Bubble Cup, so I am not sure what's the case wih problem H. If the test cases are weak, then that is definitely not good and should not happen. 2. Organizers should definitely think about all approaches and solutions for the problem. But why do you think cat flaps are bad? Not trying to go through one is a also a skill a good competitor should posses, not just good algorihms and coding knowledge. Besides, competitors should not completely rely on constraints when thinking about solutions. When solving a real world problem, you do not have these hard constraints often. You try to come up with the best solution you can with a lot of unknowns. It's fine if you think cat flaps are bad and avoid them in your competitions. I just don't think that they are generally bad and that they should always be avoided.

  • »
    »
    3 years ago, # ^ |
      Vote: I like it -8 Vote: I do not like it

    Thanks for your feedback. For the first one I'm not sure either. Perhaps this problem has so many approaches that it's impossible to construct bad cases for them all. As for the second one, I guess cat flaps can be fine as long as you realize you are going through a cat flap. Like you are understanding that you do not have an optimal solution and are trying to pass something suboptimal. For example when N is 20K and you are fairly sure there should be N log N solution but you are deliberately trying to do some optimizations in N^2. This is commonly occurred and contestants understand the risks of doing this. The problem is in my opinion when contestants are going through the flap and do not realize that. Like we thought that we had the optimal linear solution and we just need to fit it into memory, so it never occurred to us that we are going through the flap – we thought that all we have left to do is to do something clever to fit ourselves to memory. I hope I managed to explain the difference between two cases – if a team is making a conscious choice of investing time in going through the cat flap, this is fine; but if it happens without contestants' knowledge, then this is annoying. But of course, I definitely agree that with experience you should get better at avoiding them, and that is a very useful skill. The reason that I prefer that there are no cat flaps is probably because I believe in a pure algorithmic contests – that is if I have an algorithm I know whether it should work or fail (unless I have a mistake in algorithm) and then implementing it is just a method of writing your solution. Similarly you could've explained your solution written down on a paper or talking to a judge directly. So, when from constraints you think you have a solution but after several hours you simply cannot fit it in, it's annoying for me as in ideal world someone would've immediately judged my algorithmical solution as TLE.

    It would be interesting to see a contest where you don't have any constraints and you actually have to come up with the best you can. But it probably should be IOI rather than ICPC style where time doesn't matter.

»
3 years ago, # |
Rev. 2   Vote: I like it -48 Vote: I do not like it

H) "Anything more or less sensible passes" — is that bad? Hear me please because I told you that several times already. This is original design of the problem to be solved with randomizing approach — if you dont like this kind of tasks — it is your problem not ours.

B) About this problem. Here I dont understand you at all. I personally think this is a great problem, the best in the contest actually. If you cant come up with solution that passes constraints, again it is your problem not tasks's. If you think that constraints should give you a hint about what solution should be there, I am sorry, you are just wrong then

UPD: Whatever, if whole community thinks I am wrong that means I am wrong here, my apologies. Ok, next time we will avoid this kind of problems and make constraints fit solutions better.

  • »
    »
    3 years ago, # ^ |
      Vote: I like it -8 Vote: I do not like it

    Unfortunately, I don't think you understand the point I'm making about H. I don't have a problem with a randomized approaches. The problem I'm having is that "anything more or less sensible passes" and you don't need to solve the problem. My team didn't solve that problem. We just wrote some code, shipped it and got AC. Some teams did that a lot earlier than us. This is a really good on-paper problem, if you had to solve it and present a proof of why this should work. A problem that you don't need to solve to get AC is not a good problem in my opinion.

    For the problem B I have never said that I disliked this problem. It is a really cool problem. But a constraints is a way of judging. Contestants come up with algorithns and based on the constraints decide whether it's good or not. So if I cannot tell for sure whether the algorithm I came up with is good or not, then this is partially organizers responsibility because you pick the constraints. My only problem with this task is the memory limit – I believe it should have been clear whether O(N) memory would be accepted (bigger ML) or rejected (smaller ML).

    On your third point, constraints should by design give hints to what are the acceptable asymptotics of the solution. But if we make these constraints tight we can sometimes give out actual hints, and this is often avoided by weakening constraints. You're absolutely right, constraints shouldn't give me hints. I still don't see how smaller ML is a hint in this problem.

    • »
      »
      »
      3 years ago, # ^ |
        Vote: I like it +2 Vote: I do not like it

      Another problem with cat flaps is that they very often are actually crossable, which means you're allowing solutions with higher complexity than you want to pass.

      In this specific case, run-length encoding on the dp array (kind of ironic to use compression in a task about compression!) lets the linear solution pass: 20542125

      • »
        »
        »
        »
        3 years ago, # ^ |
          Vote: I like it +18 Vote: I do not like it

        I never said that linear solution shouldn't pass. Yes intended solution is O(logN), but if you manage to get AC with O(N), consider yourself solved the problem.

      • »
        »
        »
        »
        3 years ago, # ^ |
          Vote: I like it -10 Vote: I do not like it

        Oh, nice trick! We saw that there were patches of either same number or sequences of increasing by 1 numbers but did not realize that we should try to consider the differences and then it's actually highly compressable. Thanks for that!

      • »
        »
        »
        »
        3 years ago, # ^ |
          Vote: I like it +3 Vote: I do not like it

        It passes only because tests for this problem are weak: try running it on "100000000 2 100000000". Btw, this test or "100000000 1 100000000" breaks about 50% of accepted solutions for this problem.

        • »
          »
          »
          »
          »
          3 years ago, # ^ |
            Vote: I like it 0 Vote: I do not like it

          Not that serious of a memory problem, given that the compressed vector seems to peak at around 42400 elements (if we use something like 100000000 2001 100000000).

          I only did the "small array" thing because my vector version (20542046) did not pass TL (probably due to time fluctuations; when you get to 982 ms it's pretty much random whether you pass or not).

        • »
          »
          »
          »
          »
          3 years ago, # ^ |
            Vote: I like it 0 Vote: I do not like it

          This not an excuse at all, but I just have ran all solutions that got AC on tests

          • 10^8 1 10^8
          • 10^8 2 10^8
          • (10^8)-1 2 (10^8)-1

          Only 1 of 8 failed. Again I didn't check which of them are O(sqrt(N)) or O(N) or O(log^3(N)) or anything else. Still 1 more was pretty close to TL

  • »
    »
    3 years ago, # ^ |
      Vote: I like it +23 Vote: I do not like it

    Constraints should give a hint about what solution shouldn't be. I saw problems with n ≤ 109 with O(n) solutions passed. Why would I think that with n ≤ 108 O(n) shouldn't pass?

    • »
      »
      »
      3 years ago, # ^ |
        Vote: I like it -8 Vote: I do not like it

      Why did you mention the time constraints? I only thought that it wasn't clear whether the linear memory could be expected to pass. I believe that it was pretty clear that linear solutions in time are safe, considering that the onsite winner, unless I'm mistaken, solved it with an O(N log N) timewise and O(log N) memorywise solution, which in practice is faster than the asymptotics (but I would still imagine not faster than the linear). Did you had any issues with passing a linear solution timewise on the mirror?

      • »
        »
        »
        »
        3 years ago, # ^ |
          Vote: I like it +8 Vote: I do not like it

        I'm guessing you didn't check the time my linear solution took to run :)

        Locally it took 350 ms, but it's much slower on the CF servers for some reason.

      • »
        »
        »
        »
        3 years ago, # ^ |
          Vote: I like it +8 Vote: I do not like it

        It is just an example. The idea is 'Looks like I should write an O(n) solution ' which is not true. Can be applied to memory constraints too.

        My solution is with memory.

      • »
        »
        »
        »
        3 years ago, # ^ |
          Vote: I like it +21 Vote: I do not like it

        I believe that I can prove that my solution is actually O(n) but it requires some formulas. During the contest I just check that it works fast enough.

  • »
    »
    3 years ago, # ^ |
      Vote: I like it +39 Vote: I do not like it

    I am appalled by such an unprofessional response as "this is your problem not ours". This not only tarnishes your name, but also, maybe more importantly, the competition's reputation. Why should I take part in a contest whose organizers don't give a crap about what I think?

    • »
      »
      »
      3 years ago, # ^ |
        Vote: I like it +29 Vote: I do not like it

      It is not answer from organizers. It is just an opinion from me personally.

      • »
        »
        »
        »
        3 years ago, # ^ |
          Vote: I like it +16 Vote: I do not like it

        You are the organiser.

        • »
          »
          »
          »
          »
          3 years ago, # ^ |
            Vote: I like it +5 Vote: I do not like it

          I would say that this post is not aimed at the general public but is rather aimed at me directly and is a continuation of a private conversation between me and ibra so this should indeed be a personal opinion. I feel that I cannot get what I'm trying to say across to him and he believes that all that I'm saying is "randomized algorithms are shit and problem B is shit as well because I couldn't solve it" and all I'm doing is just creating hate posts, which explains the tone of the original comment. This situation saddens me deeply because that's the first time I'm seeing such a reaction from an organizer at an attempt to give a feedback, which is usually quite welcome.

  • »
    »
    3 years ago, # ^ |
      Vote: I like it +26 Vote: I do not like it

    I like problem H and I think should be more such problems.

    • »
      »
      »
      3 years ago, # ^ |
      Rev. 2   Vote: I like it +2 Vote: I do not like it

      In terms of coming up with a solution, I agree that this is an interesting problem. In terms of competition problem I don't believe it is good to have a problem where we just need to generate random numbers until everything is OK (when the intended solution is harder). Again, I look at your solution to this and this one and I do not consider this problem to be fair at all. That is my whole complaint about this problem.

      • »
        »
        »
        »
        3 years ago, # ^ |
        Rev. 4   Vote: I like it +11 Vote: I do not like it

        Do you antitest for those short solution? Seems that I know. So thay had bad tests, not bad problem

        P.S. I checked it and seems that I don't know antitest =) But it does't matter. Somebody had good heuristic, it passed tests, you should just congratulate for his luck and good intuition.

        P.P.S. I found test.

      • »
        »
        »
        »
        3 years ago, # ^ |
        Rev. 3   Vote: I like it +27 Vote: I do not like it

        Found antitest (n/2 pairs of coaches having same set of preferred teams):

        http://pastebin.com/eNFPWLRk

        • »
          »
          »
          »
          »
          3 years ago, # ^ |
            Vote: I like it 0 Vote: I do not like it

          Thanks for this! As I mentioned previously, I am not sure at this moment whether this was just weak tests or this problem unfortunately cannot be tested well to prevent non-solutions to pass. My original intuition was that there are so many different heuristics to try and get this problem without solving it that it's not possible to catch 'em all. Perhaps it was wrong, but in any case whatever the reason, the issue remains the same. Yeah, of course, sometimes someone finds a good heuristics, gets AC and on a one-off scale this is nothing crucial. But since random passed here, pretty much anything passed as well, and this situation seems more like a problem to me.

          For example, our submission was doing random assignment of teams and then assigned teams to conferences by picking teams one by one, and then put them in the conferences where they have smaller amount of edges between already assigned teams. I do not consider this a solution because we have no idea why it works. The only thing we managed to prove was that by assigning teams into conferences like that because we're always picking the biggest half and every edge gets considered once, we will actually pick at least half of the edges. But the problem is that assigning people to teams can remove some edges, so we feel that there should be a case breaking our solution (don't think yours does although I might be wrong). Perhaps there actually are some probability guarantees of our solution. But in either case I don't believe we've solved the problem and deserved an AC for that. Perhaps you might disagree with me on this one. But I believe it is a good rule of thumb that if you just try to submit something and get AC, if you succeed, you should have some kind of intuition for why it did get AC. All three of us did not expect this version to pass and we had more heuristics coming up. But yeah, I'm actually really interested if it's possible to generate a good test set for this problem so that you'd be required to solve this problem properly.

»
3 years ago, # |
Rev. 2   Vote: I like it +17 Vote: I do not like it

Here are my $0.02, being the person that coded up the AC solution of H for our (eduardische, VladGavrila and mine) team at the onsite.

I think that coming up with the idea that randomisation could work at some point of the algorithm is not hard to come by here (the very small degree of the person:team bipartite graph really hints at it, and such weirdly small constraints often hint at the fact rand() could be plugged in somewhere).

But if the probabilistic proof of the fact this works is not easy to come by, then I would expect at least some part of the construction to require somethig sensible. But from what I gathered, even trying fully randomised assignments would've worked here? In this case it really amounts to a hunch.

This does not mean that the problem is "broken" in any way. It just points at the question of what the problem is trying to reward. And here it seems that the reward comes from tenacity (i.e. the "move fast and maybe things won't break approach") rather than anything else. In fact I think another team's member told me that they got the determination to try the rand approach when they saw how another team reacted to getting AC, which in my opinion totally bypasses any educational value of a problem like this.

This was the only thing which made me dislike this problem ultimately. And I normally really like encountering stochastic problems like this (good example is NWERC 2014 Problem F: http://2014.nwerc.eu/nwerc2014problems.pdf). Otherwise I thought that the BC9 problemset was awesome and organisation was great (no hiccups at all on contest day despite several issues during training session). Great work, and I hope to go again next year.

  • »
    »
    3 years ago, # ^ |
      Vote: I like it +42 Vote: I do not like it

    I don't remember the whole problem and solution, but my teammates reduced the problem to the following: you have N subsets with sizes from 16 to 20, each subset is colored in either 1 or 2. You have to color all the elements of array in such a way that for each subset there were at least one element in this subset colored in its color.

    It's easy to prove that assignment of random colors give us a reasonably large success probability. It's about (1 - 2 - 16)n which is as I remember about 0.2 or sth like that. Repeating this solution several times decreases the error to a very small value

    • »
      »
      »
      3 years ago, # ^ |
      Rev. 2   Vote: I like it +39 Vote: I do not like it

      Note that although the probability of one person get good coloring is at least 1 - 2 - 16, the events are not independent. Union bound works instead to get 1 - 2 - 16n.

    • »
      »
      »
      3 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Thanks! Good to know that such a proof can be made to work during the contest.

      N.B. I still think that there were quite a few teams who played it completely by hunch (basically us included, but only after exhausting any other possible option).

  • »
    »
    3 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I had to look it up to be sure, but yes, here is a 100% randomized AC submission.

»
3 years ago, # |
  Vote: I like it +10 Vote: I do not like it

eduardische: I got stuck on problems B and H, so they are shit. But thanks to organizers.

  • »
    »
    3 years ago, # ^ |
    Rev. 2   Vote: I like it +38 Vote: I do not like it

    I'd just like to point out that at no point did I claim that B is a shit problem. I think I'll just leave the rest without any comments.

»
3 years ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

"Entering the door through the cat flap" — based on a real experience? :D

I agree that problems with a non-obvious stupid solution that's too easy to just guess are bad for programming contests. If I was picking contest problems, I'd tend to reject anything where it's hard to distinguish the intended solution from a stupid one (or make the stupid one intended and decrease constraints accordingly) and I'm pretty sure it happened to me as a problemsetter, too.

In the case where a random solution passes, a better analogy would be entering the door to your house by doing a random walk. Maybe you'll enter through the cat flap or a window and maybe someone just stole your door.

100M operations should pass in 1 second, right? However, we need linear memory in this approach, so it doesn't really fits. However, we use previous values in a specific pattern, so there are a lot of things to optimise, including trading memory for time, etc.

Last year at Slovak national OI, there was an intended hard problem where normal DP didn't have enough memory and it was fairly slow — with something like 4e8 operations. The intended solution was to not worry about time and optimise memory by bitsetting. I didn't even think about it, made the basic DP and then added some heuristics to skip states; when holding the non-skipped states in a map<>, it added a log-factor to the complexity and in exchange, limited the number of states to something like 1 million. It was probably harder to write than the intended solution and probably could fail, but it was much, much more efficient and seemed about as likely to work a priori (although I didn't expect it to just get full score).