О раунде 505 - Codeforces

№	Пользователь	Рейтинг
1	ecnerwala	3650
2	Benq	3582
3	Geothermal	3570
3	orzdevinwang	3570
5	cnnfls_csy	3569
6	tourist	3565
7	maroonrk	3532
8	Radewoosh	3522
9	Um_nik	3483
10	jiangly	3468

№	Пользователь	Вклад
1	maomao90	174
2	awoo	164
3	adamant	163
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	151
8	SecondThread	147
9	orz	146
10	pajenegod	145

Друзья!

Я бы хотел отдельно обсудить ситуацию вокруг негатива касательно раунда 505.

Я понимаю, что большинство участников расстроены падением задач на систестах. В самом деле, претесты в задачах B, D и E оказались слабыми. С задачей C тоже не всё гладко, но ничего страшного я не вижу.

Разумеется, это совместная недоработка автора задач и координатора. К сожалению, такие ситуации иногда возникают и избежать их бывает трудно. По этим задачам количество претестов и их типаж не казались такими уж слабыми.

Задача B: 9 претестов, есть и большие и маленькие ответы, два претеста на -1, есть претесты с n = 1 и n = 2, есть четыре претеста для n = 150000.

Задача D: 14 претестов, среди них и ручные и 4 разных генератора, несколько претестов для n = 700, большинство ответов Yes, но есть и No. Моё мнение, что здесь мало было тестов на No.

Задача E: 14 претестов. Да, эта задача на финале VK Cup содержала 10 претестов и сильно упала у участников. Я добавил в претесты еще 4 теста из тех, на которых падали участники Финала. То, что даже после этого она упала у стольких из вас лично для меня — сюрприз.

Итого, как результат претесты оказались недостаточными, но сказать, что эта очевидная недоработка авторов или координатора — сложно. Видимо, наложился типаж задач и недостаточный опыт cdkrot как координатора.

Я не вникал во все задачи, но мне показался раунд интересным. Жестких фейлов по условиям, багов в тестах или решениях замечено не было. И система работала вполне прилично, очереди не было.

Пожалуйста, поясните свой негатив к раунду. Особенно ценно будет прочитать аргументированное мнение от опытных участников.

Спасибо.

_Z9solve_lupiiPA702_KcPA702_cS1_: .LFB45: .cfi_startproc leal 1(%rdi), %r9d cmpl %esi, %r9d jge .L7 movslq %esi, %r10 movslq %edi, %rax subl $2, %esi imulq $702, %r10, %r11 subl %edi, %esi movslq %r9d, %rdi imulq $702, %rax, %rax addq $1, %rsi leaq (%r11,%rdi), %r9 addq %rax, %rcx xorl %eax, %eax addq %r9, %rdx addq %rcx, %rdi addq %r9, %r8 .p2align 4,,10 .p2align 3 .L10: cmpb $0, (%rdx,%rax) je .L9 cmpb $0, (%rdi,%rax) je .L9 cmpb $0, (%r8,%rax) jne .L12 .L9: addq $1, %rax cmpq %rsi, %rax jne .L10 .L7: rep ret .p2align 4,,10 .p2align 3 .L12: movb $1, (%rcx,%r10) ret .cfi_endproc

Комментарии (56)

Написать комментарий?

DoomzGay

6 лет назад, # |

-65

Как я уже написал, проблема не в том, что такая ситуация возникла. Проблема в том, что такая ситуация повторяется из раза в раз с данным автором.

Особенно ценно будет прочитать аргументированное мнение от опытных участников.

А вот это выглядело не очень красиво

→ Ответить

K_Y_M_bl_C

6 лет назад, # ^ |

-70

Мне кажется, Майку Мирзаянову стоит извиниться за то, что он по факту оскорбил более половины человек на моем любимом сайте кодефорцез

reeWorld

Мне кажется, или мы начали забывать, что: а) Если бы не Майк, "любимый сайт кодефорцез" просто бы не существовал в нынешнем виде. б) Щас бы обижаться на факты, тем самым демонстрируя жывотное поведение... Сколько коментариев пересмотрел на кодефе, у гроссмейстеров(красных) практически всегда коментарии конструктивнее, нежели у других. Уж такая средняя статистика. в) Все в достаточно равных условиях...

MikeMirzayanov

+29

Да, мне тоже не нравится, что ситуация с автором повторяется. Отчасти поэтому я до старта вник в некоторые из задач и никакого криминала по качеству задач не увидел. И пост этот написал, чтобы разобраться получше. В соседнем блоге про автора и этот контест есть только цитата, аппелирующая к рейтингу поста, без конкретики (считай, эмоции).

Прошу прощения, если кого-то задел — я не хотел.

wrg0ababd

+35

Зачем делать ТЛ в D 1 сек при n = 700 и авторском решении за куб, при существовании таких языков как Java?

gepardo

+31

Насколько я знаю, коды на Java без активного выделения памяти под объекты (чего не было в этой задаче), часто работают примерно с такой же скоростью, как на C++.

Еще для авторов задач вроде есть правило "Time Limit не менее 2x от авторского решения на java".

cdkrot

← Rev. 2 →

-40

Да, у нас не было решения на java, потому что казалось бы что в java что в c++ должен быть один статичный массив и несколько циклов for.

Возможно всё же стоило написать авторское на java, и действительно стоило поставить TL побольше.

Сейчас переписал на джаву свое решение с контеста, получил 700 мс:

решение

Поэтому, n ≤ 500 или ТЛь в 1.5с здесь был бы предпочтителен.

Но все же сдать эту задачу на java реально.

komendart

-8

Time Limit не менее 2x от авторского решения на java
авторского решения на java

Лол

dalex

+102

А почему слабые претесты это плохо? Если взломы не вида if n == 1, а содержательные, то, кажется, все в порядке.

+18

Кстати, случай с n = 1 как раз был учтен в претестах в задаче A :)

OneStone

+71

I think that for most problems, coming up with a correct algorithm is 80% of the work and, in my opinion, is what differentiates skilled coders from not-so-skilled ones. So obviously coming up with the right algorithm but then forgetting a special case, and then failing systests pretty infuriating, because you get the same score as someone who doesn't know how to solve the problem at all (0 points) and you'll end up losing rating over it.

That being said, of course it is acceptable that some problems have weak pretests, otherwise hacking wouldn't be a thing. But people react strongly when the pretests are so weak that they allow obviously flawed algorithms from passing, or when too many problems have weak pretests, because then you might see contestants watch their score plummet from 3000 to 300 in systests or contestants massively rising in rating because they get 10 successful hacks (and since it depends on the room you're placed, the score you get from hacks depends heavily on luck).

So what I think is the biggest issue is not the weak pretests per se, but the scale in which it has happened in this round. A possible solution might be to always include a few problems that do not have tricky cases to avoid all problems being affected. That being said, it is true that I find the problems in this round pretty good and the difficulty was well-balanced, so it's not all bad.

mredigonda

That being said, of course it is acceptable that some problems have weak pretests, otherwise hacking wouldn't be a thing.

Well, are hacks even a thing? I see them just as a "happy" solution to the problem highlighted in kostka's comment: not being possible to give full feedback because of the high number of contestants.

The thing is that the majority of other well established contests (ICPC, IOI) don't care too much if you fail because of a trivial border case, they just add some penalty.

+19

That's a very good point you make I hadn't considered while writing my reply. In any case, the message I was trying to convey is that it's not a big deal if only one problem has weak pretests and it only affects a handful of people.

In any case, what sets apart well established contests from Codeforces is that you only participate maybe twice in your lifetime in IOI or ICPC whereas in Codeforces you can participate hundreds of times, so if a few rounds have weak pretests your rating won't be affected that much over time, whereas a round of IOI or ICPC with bad testcases will mess up the results for certain.

In essence, Codeforces can afford some inaccuracy that ICPC or IOI cannot. That, of course, doesn't mean it is acceptable to have a disaster like #505 or others in the past, but one bad problem every once in a while isn't the end of the world.

caustique

+62

What you don't understand is that not giving full feedback during a contest is a feature, not a bug. It is what sets Codeforces apart from other "well established contests" such as IOI or ICPC and what makes it so interesting.

Therefore round with weak pretests should be considered not a "disaster" but rather a typical round — vice versa, problems with very strong pretests are not suitable for Codeforces but rather should be saved for other "well established" contests.

mango_lassi

I think the worst part of the pretests was B missing this case:

1
1000000007 1000000007

Probably nearly everyone who failed B to systests failed to something like this. And since this is even kind of a special case in the intended solution, I think it's odd that it was missing.

Egor

+51

I do not see how this is special case

A lot of codes output (1e9 + 7) * (1e9 + 7) (gcd of products of all pairs)

For me, the factorization loop ran while (i*i < val), not while(i*i <= val), so my solution would have outputted NO.

peach

This issue is easy to avoid if you don't multiply the two numbers to begin with.

GreenGrape

+70

Дам небольшой комментарий по поводу претестов, которые всем так не нравятся.

Я открываю случайную страницу с упавшими решениями в B и вижу:

wa 46
wa 43
wa 73
wa 53
wa 14
wa 47
wa 81
wa 82
wa 48
wa 80
wa 92
wa 60
wa 68
tl 83
tl 79

Это 15 разных тестов (повторюсь, со случайной страницы), что уже на шесть тестов больше, чем в претестах.

Тесты делали два человека: я и cdkrot, и, при всем желании, у нас нет возможности выявить все баги, которые могут допустить тысячи участников раунда. Также мы не можем угадать, на каких тестах участники массово обвалятся, а какие пройдут без особых проблем.

Я хочу искренне извиниться перед теми, кого задели слабые претесты для задач B и D. Надеюсь, вы не держите на меня зла.

Ashishgup

+57

The issue with weak pretests in this round was that generally the pretests leave out an edge case, whereas in today's contest, it failed to check TLE solutions in B or even the logic in case of D.

For instance, my solution for D did not even check for the tree being binary. While implementing, I just verified whether a tree could be constructed and it still cleared the pretests.

The scale of weak pretests in this contest lead to a lot of hacking too (with many missed opportunities), and while hacking is an integral part of the contest system, the scale at which hacking happened makes the contest frustrating.

I have enjoyed taking part in almost every CodeForces round, but contests like this one take away the fun.

Denton

We all should define what pre-tests are. Otherwise it's a meaningless controversy. What's the point of system testing if every case will be covered by pre-tests?

Also, maybe it is better to notify people about potential quallity of pre-tests in advance. i.e. in round announcement state that tests cover basic cases only or some corner cases too, etc.

kostka

← Rev. 9 →

+39

I agree that we should redefine pretests, but adding "quality of pretssts" sounds a bit ridiculous. I hope that problemsettera try their best to have strong (pre)tests.

Ok, so why do we even have pretests? People are not really used to not having full feedback nowadays. On the ICPC-like contests you have full feedback almost immediately. Same for IOI-like contests (maybe without ranking). I think that the biggest problem is the amount of participants in the regular Codeforces round. There are simply too many submissions during the contest to give everyone full feedback. On the other hand, people don't have time to test their solution, the Codeforces round is really short, compared to the other contests. Generally, I used to trust pretests. I hoped that there is a small percent of submissions that would fail on system tests, after passing pretests.

The problem(?) or the cause of the outrage is the format of Codeforces rounds. I am not saying that we should change it right away, but maybe we should rethink it. The biggest problem (in my opinion) is that after failing on system tests you cannot resubmit your solution. Of course there's a possibility that someone might hack your solution, but it is quite rare. If you fail, and you were only one stupid corner case away from the correct solution, you feel quite bad. You got the same score as someone who didn't even submit this solution or thought about it.

I don't have a good solution, but maybe someone will.

Noam527

There are simply too many submissions during the contest to give everyone full feedback.

I would agree with you, but in fact, during codeforces rounds the system also runs shadow system tests, so that the moderators of the contest (authors, testers, etc.) can view for each submission its verdict when judged on the system tests. So I'm wondering, is there really a need in pretests if system tests are already being done on the fly?

It could be that I misunderstood (and if so, then I want a clarification), and that those system tests are only run when, for instance, the queue is short. But still, I think simply doubling the number of pretests will work out just as fast as performing system tests during the round (and will also give more freedom to put more countertests in pretests).

KieranHorgan

+13

It was my understanding (though I might be mistaken) that these shadow system tests only take place on a small percentage of total submissions. I would be interested to know more about these tests however, because if they only occur on 1-3% of "pretest passed" submissions then the reduced load is understandable but we do not know what proportion of submissions get judged fully.

I do remember a couple months ago when ksun48 took part in a post-contest stream for the round which he authored, he implied that the round coordinators already knew the final standings (at the top of the leader board at least). Do the shadow tests favor those contestants that did the best? Or does every single submission get run on the shadow tests? Bottom line, I agree that the details of how these shadow system tests should be clarified.

I was a problemsetter and so I can tell about shadow testing. It tests all the submissions. But there can be the situation (often at the beginning of the contest) that small percentage of submissions that passed pretests were tested on the full tests — just because submissions come so often. By the end of the contest people submit more rarely and the coverage by shadow testing can reach 100%. So, the authors can know the results before system testing.

+23

It is not exactly true, that the round authors know the exact standings before the end of contest.

But the shadow testing exists (more in chronological order rather than for top), for example sometimes this allows authors spot some issues like bad solutions passing and perform counter-measures before the results are final.

Also, it was not me who was on a stream (this round was coauthored with 300iq, and he was the one on the stream).

+10

Regarding the twitch stream, I was indeed mistaken. I was thinking of Round #492 which was co-authored by ksun48 in which he appeared on stream afterwards to discuss to contest. Apologies for the unwarranted tag

Xellos

+47

E: "Weak" pretests are a feature simply because there's a variety of solutions and there's no way to make pretests that are good for all of them. Also, since the grid is very sparsely populated for all but the most trivial inputs, it's very easy to find a good solution on any single test by coincidence and rather easy to make a solution that fails for a specific type of tests. You'd need like 50 pretests (or choose 15 very very carefully) to claim they're strong. Knowing which problems are easy to make a hard-to-spot (e.g. by submitting) mistake in is an important skill and applicable here.

D: Pretests were weak. There's no question about it. I checked how fast my solution runs on max. N, random sequence, and got over 500 ms; pretests gave max. 77 ms, which is very misleading. I'd definitely include that random test if I were making pretests, if only because it's the easiest, and that's why I'm also surprised it didn't occur in pretests.

B: There's some freedom in what to print, which means good tests are harder to make — similar to E — but while I didn't experience any issues (in part because I had plenty of time to think while cursing my Internet disconnection), this... doesn't seem like a problem where a lot of wrong solutions should pass pretests — unlike E. If people complain, it's probably true that there wasn't enough work put into what should be in pretests. 9 isn't much, especially if it includes samples.

Note that yes/no problems are bad because weak pretests and weak systests are a possibility. There really should have been something extra in D, e.g. counting possible roots.

C137

+52

I don't see the weak pretests as the main problem, am OK with it as it sometimes helps you be more accurate, and work hard to test your solution locally before submitting, But my main concern is with people who get high ratings just by hacks, there has been a lot of talks about hacks recently, and I think this should change, My suggestion is to use Dynamic scoring for hacks, It isn't fair for a problem with lots of hacks to give you same as a problem with < 10 hacks.

MetB

Что может быть лучше согревающего -78 рейтинга этими холодными российскими вечерами?

Задачи были довольно интересными. К сожалению, я успел добраться только до 4 задач и надеюсь, что мой отзыв учтут даже при том, что участник я неопытный.

Задача А была хорошей, самая обыкновенная задача А, в которой всё же пришлось думать. Задача В была на делители, так что, по-моему, нуждалась в особо продуманных претестах на граничные случаи. Чтобы вы понимали, моё решение раскладывало числа на множители пустым вектором простых делителей. И всё же, это решение прошло претесты. Задача С показалась мне довольно скучной и некрасивой. Мне всегда казалось, что задачи Div2C должны быть красивыми. Собственно, я пропустил знак "равно", но это не то, что всегда ловит претест. Претесты на задачу D меня вообще убили. Я отправил абсолютно неправильное решение (Вы всегда можете посмотреть его у меня в профиле, если захотите. Я решал потоками. Причём неправильно их написал.) Однако претесты пройдены. Мне показалось, что мне повезло и я случайно написал правильное решение, но это просто были возмутительно слабые претесты. "If your solution passed pretests, then It's quite reasonable", — писал в своём блоге Михаил Мирзаянов 8 лет назад. Однако претесты со своей задачей в этот раз не справились. Желаю удачи автору, всем кто поднял или уронил рейтинг. Надеюсь, мы все вынесли отсюда урок.

Ezio07

The pretests were sure weak but I think the round received unreasonably high negative feedback. I myself got hacked on B but I won`t blame the authors for that matter, as pretest were checking correctness of the algorithm (and not the worst case complexity, which user should be aware of, before submitting).

The problems were well balanced as well. Thank you for this Coding Weekend!

athin

← Rev. 3 →

+64

Actually, the problems are really good (thanks a lot!) and pretests are still acceptable although they can be better by make them a bit stronger. The issue that is bugging me (and maybe other) is that the constraint -- on Problem D -- is misleading and too strict.

N = 700 actually suggests for a solution with O(N²), O(N²logN), or lower; and of course this can be easily argued. But setting it for a O(N³) solution is a critical flaw. 700³ is barely on the time limit constraint as someone cannot deduct for sure whether his solution got AC or TLE without submitting it. The situation gets worse if multiple solutions on the same complexity got different verdicts (ACs and TLEs).

Therefore, I'd like to suggest that the constraint should be made clear and not strict.

If the problem setter insists to make N = 700 works for O(N³) solution, test it with a pure O(N³) solution without any optimization. Check the time, and it's good option to set the time limit twice of it. Additionally, can check it too with a slow language like Java.
If not, change the N. N = 300 is a reasonable choice as O(N⁴) will still be killed. Setting higher N but with higher time limit is also another option.
Also, for a problem with strict constraints like this, it's suggested to include large testcase as a pretest so it doesn't really mislead the participants.

Kostroma

+16

O(N³) doesn't imply N³ operations with integers. In this problem there are $\text{[math]}$ calculations of dp, which seems quite reasonable for this solution to work in time.

What is pure O(N³) solution then? The one that makes other $\text{[math]}$ operations which don't make any sense?

Well, not literally N³ actually, but as you can see indeed N³ / 6 is a reasonable solution to test. Unfortunately, my solution got TLE on that complexity.

By pure O(N³) I mean without a silly optimization, like using bitsets, or && vs & vs min-max (I recalled someone got AC by just changing & to &&), or some trick with break, etc. Just do as it is.

OK, in overall I agree that TL is too tight, for example my solution 41841784 which I supposed wuold be fast runs in almost half TL. Btw I used bitsets, but only for convenient code — otherwise I would have to store dp for last two segments lengths, which is less enjoyable.

+21

I tried to optimise my in-contest solution for cache and stuff. Now it does the following in assembly (compiled with --std=c++17 -O2):

That's one of the two functions that form the main logic of the program. Only the .L9 and .L10 labels contain what's executed in the loop, so that's 2x 9 quite cheap instructions in a loop N^3/6 times. It's also cache-efficient. Still, at worst one million, realistically maybe half a million assembly instructions in total and it runs 300ms on a test with a random sequence.

If that's the best I can do without bitsets, using fairly advanced knowledge for algorithmic competitions (cache) and it still takes at least 1/3 of the TL, that's pretty damn tight. Doable, of course, but pretty damn tight. My actual in-contest submission is 2x slower and it's easy to be 3x slower and get TLE.

+33

Not having any "advanced knowledge", I wrote simple recursive solution and it works 78ms — 41910650. Did you consider this approach before claiming that the TL is "pretty damn tight"?

What's the maximum number of instructions (assembly) executed for the main (deepest) loop in total?

kakgucci

Wow, really ? You have the break inside! You can't assume, that this break will catch too often, when the solution really exists, so let's be honest and remove it, ok ? You are not so fast now : click

MrDindows

+20

I agree that the TL was too tight for CF contest (it could be OK for ICPC). But here is how O(N³) without recursion and breaks could run 200ms on max test. That's actually my submission from the contest, but without template and prettified a bit. I haven't analyzed assembly, but the main feature, I guess, is that I calculate both dp in a single loop and they are stored in single array dp[N][N][2].

AlwaysMerlin

I enjoyed the tasks. They seemed to be interesting, balanced, and there were no big issues with the English problem statements. The pretests were on the weak side, but I do not see this as a problem. Honestly, my only frustration was that N<=700 was a strange bound for problem D. I don't think this is a big problem, since an experienced coder could tell it would probably run in time with good optimisation.

The problem seems to be expectation. Generally speaking, most other rounds have had stronger pretests. Competitors have therefore come to expect stronger pretests. I believe the negative feedback comes from this expectation not being met.

My opinion is that better defined guidelines about pretests could help. Competitors will then know what to expect, and problem setters can follow these guidelines to ensure their round meets these expectations. I don't know if stronger or weaker pretests are better, I leave that discussion to more knowledgeable competitors.

aboAdnan

For B, there was weakness in testing the logic of the solution also, my first solution's logic (haked) was so far from the correct logic, see it and laugh XD. 41842360

Anyway, thanks for your efforts in this and all contests.

SeehtEntity

-10

Problem D: I don't understand why the answer of this test is "No". Can anyone explain for me? Thanks.

Test 54: 4 23 247 299 703

I think it can be: 23 | 299 | 247 | 703

bazsi700

I don't really understand your notation, but I assume that 299 is the right child of 23, 247 is the left child of 299 and 703 is the right child of 247. It won't be a binary search tree because 703 is in 299's left subtree but it should be in the right one.

Thanks.

a_kk

-12

I think having pretests with some missing corner case is ok but pretests should not be so weak that it fails to check TLE solutions. In Problem B and Problem C(surprisingly) there were solutions which failed due to TLE in systest. I think there should be few strong pretest that checks for TLE solutions.

rotavirus

-45

700 — очень странное ограничение. Почему не триста? наверное, чтобы не сосать у тракториста.

WNG

Just a sidepoint (not sure it's relevant here, but it seems related) : on some problems (like the B), it was not possible to submit any hack that satisfied the input constraint, because of the limitation in the input size to 256Kb.

One example is test 83, whose input it more than 10 times bigger than what can be submitted as a hack. I believe putting the same constraints on the tests and on the hack could increase the possibilities for participants to strengthen the pretests themselves through more thorough hacking. It would also seem more fair for people attempting to hack.

Besides this, I believe philosophically, having weak pretests is another form of competitive programming, cause programming is as much "creating algorithm" than "implementing algorithm without errors". The beauty of some problems are in the details, and it is really fun to have contests like yesterday's.

+30

you can use generator

mmaxio

А что насчет разбора 504-го раунда?

-21

Семьсот — очень странное ограничение. Почему не триста? наверное, чтобы не сосать у тракториста.

tdkd

-29

My code in B was hacked by this test:

The pretests in B were not good enough. However, in my opinion, the pretests in A were good with pretest 4 (length = 1) in pretests that avoid so many hacks.

Redux

I think that case is simple enough for you to check yourself, no?

ContestDestroyer

"It's your fault to make mistakes while coding, don't ever blame anything else"

I always think about it whenever I failed.

Bakry

Pretests in C problem were very weak....I submitted an O(n ^ 2) solution and it passed the pretests and Another Wrong Answer Solutions passed Pretests

Блог пользователя MikeMirzayanov