Are problem ratings busted?

Блог пользователя neal

Автор neal, 4 года назад, По-английски

Let's look at two problems from the last round, Round 657 (Div. 2):

Problem D 1379D - New Passenger Trams currently has 367 solvers and had 77 solvers in the official contest (among rated participants).

Problem E 1379E - Inverse Genealogy currently has 44 solvers and had 0 solvers in the official contest.

Meanwhile both problems have the same difficulty rating of 2400. How does that make any sense?

problem, ratings

+186

neal
4 года назад
23

Комментарии (21)

Показать архивные | Написать комментарий?

neal

4 года назад, # |

+61

On top of that, problem F1 1379F1 - Шахматные баталии (простая версия) currently has 97 solvers and had 10 solvers in the official contest, which is still clearly easier than problem E. But it has a difficulty rating of 2700.

→ Ответить

acraider

4 года назад, # |

+55

Another problem that is certainly not worthy of 2900 rating, 1372E — Omkar and Last Floor

→ Ответить

alon276

4 года назад, # |

← Rev. 8 →

-47

I also have examples for the opposite case. problems https://codeforces.com/contest/1385/problem/E and https://codeforces.com/contest/1385/problem/F are 2000 and 2300 which doesn't make sense. E is a small variation to a classic problem, and F is a little hard but really not that hard. Should be more like 1700-1800 and 2000-2100.

Edit:

Wow I'm sorry if I offended anybody, I didn't mean to diminish anyone's achievement!

→ Ответить

AnandOza

4 года назад, # |

+18

Good news, 1397D and E are now 2300 and 2800 per your suggestion.

→ Ответить

MikeMirzayanov

4 года назад, # |

+70

Thanks. Rarely some heuristics work not good. In this case, I need to change ratings manually.

→ Ответить

ritesh1340

4 года назад, # ^ |

Could you also fix this one https://codeforces.com/problemset/problem/86/D

Its rating is 2900, and like... That should not be true right ?

→ Ответить

CoderAnshu

4 года назад, # ^ |

Yes , of course it should be around 2000-2100

→ Ответить

aryanc403

4 года назад, # ^ |

Imho this explains very well why 2700+ rating is correct for 86D.

→ Ответить

CoderAnshu

4 года назад, # ^ |

But bro , in current situtaion i dont think that it is a non standard thing .

→ Ответить

aryanc403

4 года назад, # ^ |

← Rev. 2 →

In that case, you should ask mike to take into account upsolved submissions as well. Rating of people when they upsolved it. click click2

Problem rating is correct as far as the spirit of rating formula is concerned and it shouldn't be changed just for the sake of it.

→ Ответить

AnandOza

4 года назад, # ^ |

+39

Can you explain what the heuristics are, or what the overall system is? I used to think problem difficulties were calculated by a single formula with a simple interpretation.

Thanks!

→ Ответить

dantrag

4 года назад, # ^ |

I guess the mystery is somewhere here, in UPD2 "coefficients".

→ Ответить

AnandOza

4 года назад, # ^ |

← Rev. 2 →

Yeah, the original blog announcing problem difficulties (https://codeforces.com/blog/entry/62865) said they are calibrated so that if your rating is $$$R$$$ and the problem rating is $$$r$$$, your probability of solving it during an official round is $$$f(R - r)$$$ for some function $$$f$$$ with $$$f(0) = 0.5$$$, similar to Elo and similar ratings for two players.

But I don't really understand where they come from, like... is it just based on fitting to the ratings of the participants during the official contest? And if so, is it pre-contest ratings, post-contest ratings, or per-contest "performance ratings"?

Plus, as you said, it seems from Mike's comments like there are probably some ad-hoc heuristics/hacks on top of this basic formula, but we don't know what they are.

→ Ответить

MikeMirzayanov

4 года назад, # ^ |

+54

I can't easily explain all the details. In the perfect world problem rating is such a rating of opponent that your probability to win him equals to the probability to solve the problem. But in the real world, the data is dirty: consider tourist tried div3 but A is too boring for him. So statistically he didn't solve it and it will give a great boost to the problem rating. I tried to count only official submissions, but for example, for hard div3 problems official submissions give less information than unofficial. So my current way to calculate problem ratings full of some weights, coefficients and heuristics. You can try yourself using API, but I don't think there is a silver bullet to calculate ratings much better. I think now in 98% ratings are quite good, and rest ratings can be tuned manually.

→ Ответить

AnandOza

4 года назад, # ^ |

+16

Thanks! I was mostly curious for my own understanding, rather than trying to suggest I could come up with a better system. It would be nice to know the underlying system that originated these ratings while looking through the problemset.

Is it common for highly-rated participants to skip easy problems in low divisions? Anecdotally, when I look at Div3 results, I see GMs and IGMs at the top of the rankings, usually having done the problems in order. I guess I don't see the GMs/IGMs who do the problems out of order though, since they aren't at the top of the rankings. :P

I was also curious if the "rating" of a participant in a contest is considered as their pre-contest, post-contest, or per-contest-performance rating?

→ Ответить

aryanc403

4 года назад, # ^ |

+11

Can you just open source the formula (Just like you did with rating formula)? And allow other interested people to do a PhD.

→ Ответить

NRK7

4 года назад, # ^ |

1384B2 - Koa and the Beach (Hard Version) had 307, 1384D - GameGame had 144 and 1384B1 - Koa and the Beach (Easy Version) had 846 solves in contest-time in Div2. But Currently they have 2200, 1900 and 1900 difficulties. Please fix them.

→ Ответить

saarang

4 года назад, # |

← Rev. 2 →

+22

What about these: 1183E(Easy version) being rated at 2000 while 1183H(Hard version) being rated at 1900?

→ Ответить

bhushanbaby

4 года назад, # |

+59

not ratism

→ Ответить

AnandOza

4 года назад, # ^ |

+24

I think there's a big difference between "unrated" and "rated", moreso than "rated low" and "rated high".

Unrated is typically either people who

don't participate in contests, so they may not really have enough experience with Codeforces to meaningfully talk about contest ratings or logistics.
people who make a second account to post things. If even the author doesn't think their post is good enough to want it associated with their primary account, how likely is it that the post is actually good?

→ Ответить

LilyWhite

4 года назад, # ^ |

Also, probably the new account was made so that whatever it posts cannot be traced to the original one, for example, when it was used to post "Reveal how xxx cheats" stuff.

→ Ответить