Consider renting cloud servers for contests

#	User	Rating
1	ecnerwala	3650
2	Benq	3582
3	Geothermal	3570
3	orzdevinwang	3570
5	cnnfls_csy	3569
6	tourist	3565
7	maroonrk	3532
8	Radewoosh	3522
9	Um_nik	3483
10	jiangly	3468

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	163
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	151
8	SecondThread	147
9	orz	146
10	pajenegod	145

It costs 4.6 USD an hour to rent 48 cores / 96 threads on AWS. At this rate it will cost roughly 100 USD an hour to rent 1000 cores.

Assume we rent 1000 cores for 2 hours. Given that a very large contest has 20k users, this works out to 1000 * 2 * 60 / 20000 = 6 entire minutes of computation per user. 6 minutes is plenty of time to judge all a user's submissions.

If you are afraid of heavy load in the beginning and not at the end, you can of course alter distribution of cores for same cost (ex. 1200 cores in first hour, 800 cores in second hour).

CF receives 100k USD in the last donation round. This is enough to pay for the server costs of 500 rounds at 200 USD/round. Even assuming that only a tiny portion of funds is allocated to server costs, plenty of rounds can still be paid for without congestion.

It is also likely that these servers will be much more stable (I have heard a rumor that ITMO professor unplugged the power to the computers to spite some students and CF went down during that time, what if that happens in contest?)

It seems today's issues are not caused by server load but by some other bug, but of course there are plenty of times where the server gets overloaded and round is ruined.

Conclusion: 200 USD per contest for speedy speedy

EDIT: you could probably also mine monero or something while the servers are low-load to recoup some of the 200 USD.

Comments (28)

Show archived | Write comment?

shiftyblock

4 years ago, # |

+39

farmersrice for admin!

→ Reply

ma_da_fa_ka

-39

I think backend is made in php thats the reason it sucks

Savior-of-Cross

+58

AnandOza

6 minutes is plenty of time to judge all a user's submissions.

Can you explain this? I guess 1 second per pretest times 30 pretests times 12 submissions is 6 minutes, but I imagine there's a lot of overhead for things like compiling.

pllk

4 years ago, # ^ |

+16

It is a very rare situation where a submission uses one second for each test. In most cases, either it is too slow and will get TLE on the first large test (and other tests can be skipped), or then it is fast and will process all the tests quickly.

SecondThread

← Rev. 2 →

-7

I think the vast majority of users only really submit A/B (and maybe C). Those almost always have multiple tests per file. Also, most TLEs will be like TLE test 6 or something, and there are many more WAs on early tests than there are TLEs on very late pretests.

The bigger issue I think is that the submission demand is very much not equally distrusted. From minute 5 until minute 25, you have a ton of demand, after that it is much lower. You can see this by looking at where the queue is when 50% of submissions are judged (usually it is quite early in contest).

EDIT: Like you said, if you redistribute the cores across the contest that would fix this nicely though.

wthIC

+18

very nice TLDR

+21

This is an interesting calculation and worth considering. However, it may be difficult to accurately measure code running time in a cloud server (can we assume that each server is identical, there are no other processes running, and so on).

On the other hand, it is easy to distribute code submission evaluation (you can just buy more servers), but the actual bottleneck may be the web server and database. It can be much more difficult to try to fix that.

aryanc403

CodeChef: Scaling up – new cloud-based checkers

"The execution times on the new cloud-based checkers differ from our old checkers by around 10%. The exact difference is subject to the problem and the code being evaluated."
You can ask them what benchmarks they did.

I think the later paragraph is more relevant:

"The other factor is the execution time fluctuations. No two machines are exactly the same, and even in the same machine, the environment during two different runs is also not exactly the same. -- We expect the execution times to be within +/- 13% difference. That is, if a code runs in 1 second, on running it many times, the different execution times might go up to 1.13 seconds sometimes, and similarly lower."

In my opinion, 13% is too much, I wouldn't accept that.

mightymercado

+13

Even so, if a problem requires less than 13% accuracy in run time, then I wouldn't think it's a good problem in the first place. At best, they are special cases. And I think offloading these problems (or test cases) to more consistent local servers is probably a good idea if you really need consistency.

+24

If you submit the same code several times, the running time should be the same. Otherwise you always have to think: "maybe I had bad luck, let's submit the code again".

Of course the model solution should much faster and not close to the time limit, but there are often many possible approaches and some of them may require more time. It is not fair if such a solution will be sometimes accepted and sometimes not.

Currently we rerun TLE submissions. We can just rerun TLE or which passed by just 15% on cloud on our own servers.

15% is just a no. Can be decided based on experiments.

+11

I think a hybrid approach would work best where the judging system is designed so that additional judge "servers" can function independently as consumers. Basically, the main and primary servers would be in their HQ, and some cloud instances can be created on the fly. With cloud servers, by only turning on instances on-demand, you only pay for the hours, unlike the monthly costs of VPS. I don't think there's a necessary need to move everything in the cloud.

KalasLavas

But hardware would be different.

Fair point.

I was pretty sure I glanced over an article some time ago by HackerRank on how they implemented parallel execution of test cases using AWS. And I finally found it. There is no mention of how they manage hardware difference. I'm guessing they simply used virtualized stats such as vCPU.

eidan

+34

In the recent Global Round 9, which had over 7k participants, the servers worked perfectly. I don't think the issue is related to that.

real.emerald

Well, as far as the number of participants goes, there have been rounds with 20k+ participants and no issues... (I don't remember anything wrong in this round, for example)

But yeah. The point's valid.

haengsyo

+23

If you're worried about server utilization and wasting money due to it, I think an alternative you can consider is to use AWS Fargate (Fargate can manage Docker containers for you) as opposed to using EC2 instances. When you use Fargate, you only pay for the amount of resources you need to run your containers. Fargate does all the management for you (like finding the necessary servers to run your containers), so you don't have to pay extra.

spookywooky

+29

I do not think that plenty cpu cores would fix the problem. Todays outage was clearly no cpu overload.

jerdno

While I think move to cloud is a big plus, there are some problematic parts that you need to include in your price calculation. For example you need to either migrate DB (and pay for it), or somehow setup connection to your on premise DB (maybe aws direct connect, but I have no experience with it), prepare and test good scaling policies, pay for load balancers, ...

Gassa

+35

I sincerely don't understand why people are so much into discussing a likely non-problem here.

Sheer computing power sounds like the least of the problems. Even if it was, adding more invokers physically is a minor task, orders of magnitude simpler than moving the whole thing to the cloud.

A testing system is not like "yeah everything else just works seamlessly, but how do we add more invokers?". Far from it in fact.

Indeed this discussion is probably not related to yesterday, but "long queue" has also been a problem recently and adding more judge servers can solve the problem.

AnotherDankGuy

← Rev. 4 →

-68

ashish_vaghani

How do you know he's doing NOTHING? Please don't assume things. He must be much more worried about this than us

+64

Come on, man, give Mike some credit. He's running by far the biggest programming competitions in the world, which involves encountering problems that no one else has had to figure out yet (at least in this market). He's definitely working hard, it is just hard to get stuff at this scale working perfectly without a few hiccups.

ValarDohaeris_

Lets make this the most -ve rated comment on CF

farmersrice's blog