farmersrice's blog

By farmersrice, history, 2 months ago, In English

It costs 4.6 USD an hour to rent 48 cores / 96 threads on AWS. At this rate it will cost roughly 100 USD an hour to rent 1000 cores.

Assume we rent 1000 cores for 2 hours. Given that a very large contest has 20k users, this works out to 1000 * 2 * 60 / 20000 = 6 entire minutes of computation per user. 6 minutes is plenty of time to judge all a user's submissions.

If you are afraid of heavy load in the beginning and not at the end, you can of course alter distribution of cores for same cost (ex. 1200 cores in first hour, 800 cores in second hour).

CF receives 100k USD in the last donation round. This is enough to pay for the server costs of 500 rounds at 200 USD/round. Even assuming that only a tiny portion of funds is allocated to server costs, plenty of rounds can still be paid for without congestion.

It is also likely that these servers will be much more stable (I have heard a rumor that ITMO professor unplugged the power to the computers to spite some students and CF went down during that time, what if that happens in contest?)

It seems today's issues are not caused by server load but by some other bug, but of course there are plenty of times where the server gets overloaded and round is ruined.

Conclusion: 200 USD per contest for speedy speedy

EDIT: you could probably also mine monero or something while the servers are low-load to recoup some of the 200 USD.

 
 
 
 
  • Vote: I like it
  • +305
  • Vote: I do not like it

»
2 months ago, # |
  Vote: I like it +39 Vote: I do not like it

farmersrice for admin!

»
2 months ago, # |
  Vote: I like it -39 Vote: I do not like it

I think backend is made in php thats the reason it sucks

»
2 months ago, # |
  Vote: I like it +58 Vote: I do not like it

farmersrice for admin!

»
2 months ago, # |
  Vote: I like it +3 Vote: I do not like it

6 minutes is plenty of time to judge all a user's submissions.

Can you explain this? I guess 1 second per pretest times 30 pretests times 12 submissions is 6 minutes, but I imagine there's a lot of overhead for things like compiling.

  • »
    »
    2 months ago, # ^ |
      Vote: I like it +16 Vote: I do not like it

    It is a very rare situation where a submission uses one second for each test. In most cases, either it is too slow and will get TLE on the first large test (and other tests can be skipped), or then it is fast and will process all the tests quickly.

  • »
    »
    2 months ago, # ^ |
    Rev. 2   Vote: I like it -7 Vote: I do not like it

    I think the vast majority of users only really submit A/B (and maybe C). Those almost always have multiple tests per file. Also, most TLEs will be like TLE test 6 or something, and there are many more WAs on early tests than there are TLEs on very late pretests.

    The bigger issue I think is that the submission demand is very much not equally distrusted. From minute 5 until minute 25, you have a ton of demand, after that it is much lower. You can see this by looking at where the queue is when 50% of submissions are judged (usually it is quite early in contest).

    EDIT: Like you said, if you redistribute the cores across the contest that would fix this nicely though.

»
2 months ago, # |
  Vote: I like it +18 Vote: I do not like it

very nice TLDR

»
2 months ago, # |
  Vote: I like it +21 Vote: I do not like it

This is an interesting calculation and worth considering. However, it may be difficult to accurately measure code running time in a cloud server (can we assume that each server is identical, there are no other processes running, and so on).

On the other hand, it is easy to distribute code submission evaluation (you can just buy more servers), but the actual bottleneck may be the web server and database. It can be much more difficult to try to fix that.

  • »
    »
    2 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    CodeChef: Scaling up – new cloud-based checkers

    "The execution times on the new cloud-based checkers differ from our old checkers by around 10%. The exact difference is subject to the problem and the code being evaluated."
    You can ask them what benchmarks they did.

    • »
      »
      »
      2 months ago, # ^ |
        Vote: I like it +8 Vote: I do not like it

      I think the later paragraph is more relevant:

      "The other factor is the execution time fluctuations. No two machines are exactly the same, and even in the same machine, the environment during two different runs is also not exactly the same. -- We expect the execution times to be within +/- 13% difference. That is, if a code runs in 1 second, on running it many times, the different execution times might go up to 1.13 seconds sometimes, and similarly lower."

      In my opinion, 13% is too much, I wouldn't accept that.

      • »
        »
        »
        »
        2 months ago, # ^ |
        Rev. 2   Vote: I like it +13 Vote: I do not like it

        Even so, if a problem requires less than 13% accuracy in run time, then I wouldn't think it's a good problem in the first place. At best, they are special cases. And I think offloading these problems (or test cases) to more consistent local servers is probably a good idea if you really need consistency.

        • »
          »
          »
          »
          »
          2 months ago, # ^ |
            Vote: I like it +24 Vote: I do not like it

          If you submit the same code several times, the running time should be the same. Otherwise you always have to think: "maybe I had bad luck, let's submit the code again".

          Of course the model solution should much faster and not close to the time limit, but there are often many possible approaches and some of them may require more time. It is not fair if such a solution will be sometimes accepted and sometimes not.

      • »
        »
        »
        »
        2 months ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        Currently we rerun TLE submissions. We can just rerun TLE or which passed by just 15% on cloud on our own servers.

        15% is just a no. Can be decided based on experiments.

»
2 months ago, # |
  Vote: I like it +11 Vote: I do not like it

I think a hybrid approach would work best where the judging system is designed so that additional judge "servers" can function independently as consumers. Basically, the main and primary servers would be in their HQ, and some cloud instances can be created on the fly. With cloud servers, by only turning on instances on-demand, you only pay for the hours, unlike the monthly costs of VPS. I don't think there's a necessary need to move everything in the cloud.

  • »
    »
    2 months ago, # ^ |
      Vote: I like it +3 Vote: I do not like it

    But hardware would be different.

    • »
      »
      »
      2 months ago, # ^ |
        Vote: I like it +3 Vote: I do not like it

      Fair point.

    • »
      »
      »
      2 months ago, # ^ |
        Vote: I like it +4 Vote: I do not like it

      I was pretty sure I glanced over an article some time ago by HackerRank on how they implemented parallel execution of test cases using AWS. And I finally found it. There is no mention of how they manage hardware difference. I'm guessing they simply used virtualized stats such as vCPU.

»
2 months ago, # |
  Vote: I like it +34 Vote: I do not like it

In the recent Global Round 9, which had over 7k participants, the servers worked perfectly. I don't think the issue is related to that.

  • »
    »
    2 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Well, as far as the number of participants goes, there have been rounds with 20k+ participants and no issues... (I don't remember anything wrong in this round, for example)

    But yeah. The point's valid.

»
2 months ago, # |
  Vote: I like it +23 Vote: I do not like it

If you're worried about server utilization and wasting money due to it, I think an alternative you can consider is to use AWS Fargate (Fargate can manage Docker containers for you) as opposed to using EC2 instances. When you use Fargate, you only pay for the amount of resources you need to run your containers. Fargate does all the management for you (like finding the necessary servers to run your containers), so you don't have to pay extra.

»
2 months ago, # |
  Vote: I like it +29 Vote: I do not like it

I do not think that plenty cpu cores would fix the problem. Todays outage was clearly no cpu overload.

»
2 months ago, # |
  Vote: I like it 0 Vote: I do not like it

While I think move to cloud is a big plus, there are some problematic parts that you need to include in your price calculation. For example you need to either migrate DB (and pay for it), or somehow setup connection to your on premise DB (maybe aws direct connect, but I have no experience with it), prepare and test good scaling policies, pay for load balancers, ...

»
2 months ago, # |
  Vote: I like it +35 Vote: I do not like it

I sincerely don't understand why people are so much into discussing a likely non-problem here.

Sheer computing power sounds like the least of the problems. Even if it was, adding more invokers physically is a minor task, orders of magnitude simpler than moving the whole thing to the cloud.

A testing system is not like "yeah everything else just works seamlessly, but how do we add more invokers?". Far from it in fact.

  • »
    »
    2 months ago, # ^ |
      Vote: I like it +8 Vote: I do not like it

    Indeed this discussion is probably not related to yesterday, but "long queue" has also been a problem recently and adding more judge servers can solve the problem.

»
2 months ago, # |
Rev. 4   Vote: I like it -68 Vote: I do not like it

-

  • »
    »
    2 months ago, # ^ |
      Vote: I like it +3 Vote: I do not like it

    How do you know he's doing NOTHING? Please don't assume things. He must be much more worried about this than us

  • »
    »
    2 months ago, # ^ |
      Vote: I like it +64 Vote: I do not like it

    Come on, man, give Mike some credit. He's running by far the biggest programming competitions in the world, which involves encountering problems that no one else has had to figure out yet (at least in this market). He's definitely working hard, it is just hard to get stuff at this scale working perfectly without a few hiccups.

  • »
    »
    2 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Lets make this the most -ve rated comment on CF