Problem Ratings are Recalculated [May, 2020]

4 years ago, # ^ |

← Rev. 2 →

+117

What do you think about such quick-fix: put non-rated problems to the end of the list (after most rated problems)?

UPD: I just removed unrated problems from the sorted list of problems.

→ Reply

lperovskaya

4 years ago, # ^ |

+126

kinda works, but I'm sure some of the more successful contestants are using descending order of difficulty to warmup XD

→ Reply

arthurconmy

4 years ago, # |

+29

This is a good update, particularly since problem recommendation bots, i.e https://github.com/cheran-senthil/TLE use these problem ratings.

Hope everyone can ;gitgud faster now

→ Reply

dejavucoder

4 years ago, # ^ |

+11

Have you used this? I rely on https://recommender.codedrills.io/.

→ Reply

Vladik

4 years ago, # ^ |

+19

→ Reply

Clix

4 years ago, # |

+21

oh i think taking the users opinions could help, like some one who solved the problem in a hard way will give some rating to his solutions i think this may stable the rating :)

→ Reply

ouuan

4 years ago, # ^ |

+65

If you mean let the users give ratings to problems, and calculate the rating of a problem by something like the average value of these ratings from the users, then it's probably not a good idea.

It sounds good, but it's hard to judge the difficulty manually, and it could be abused. It is especially inaccurate when a problem is solved by only a few people, and it is hard to correct the difficulty of a problem solved (and voted) by lots of people. People often follow others in the voting, just like to upvote/downvote a comment.

In the Chinese online judge Luogu, the difficulties of problems were judged by users' votes, and they were far more inaccurate than Codeforces. Now Luogu has dropped voting for problem difficulties, and now the difficulties are set manually by the problem uploaders and the admins, though users can send feedback on the difficulties and hopefully they will be corrected.

However, it could be a good idea if the votes are used to improve the rating calculation algorithm instead of directly change the problem ratings.

→ Reply

sh_maestro

4 years ago, # ^ |

+18

For difficulty, besides taking in the rating of people who solved it during a contest, perhaps it should take duration of the problem and number of tries as well.

Perhaps there should be a separate "popularity" concept (for eg. on leetcode, you can see the likes and dislikes for any problem). "Popularity" is different from "Difficulty", but may be useful as a second dimension. So if I'm looking for a binary search problem in the 1800 difficulty level, I would likely start off with something that was rated highly by a majority of people. More specifically, perhaps rated highly by people who were 1800 or higher at the time they solved it.

While there is always room for abuse in any situation, I wonder why anyone would be interested in abusing a problem's rating (maybe if they were incredibly frustrated with the problem, or had a vendetta against the problem writer and decided to downvote via multiple accounts? :) )

→ Reply

ouuan

4 years ago, # ^ |

← Rev. 4 →

+19

I agree that a separate vote for "popularity" is good.

I wonder why anyone would be interested in abusing a problem's rating

For example, one may vote for 3800 in every problem he/she solves, so that in his profile (something like cfviz) there will be more hard problems. What's more, it's hard to guess how people would abuse.

If you are interested in this, there are some relevant discussions on Luogu (in Chinese):

One of the reasons for the increase of abuse in votes for problem difficulties on Luogu, is that Luogu added a new feature that the difficulties of problems a user solved are displayed in his/her profile page.

→ Reply

sh_maestro

4 years ago, # ^ |

Thanks, I see what you were saying. It's interesting to see how they detected the cheaters.

→ Reply

4 years ago, # ^ |

How about only accepting votes on difficulty when the reasoning for this difficulty is explained and only from accounts with more than X contests to prevent abuse (prevent getting Amouranth Mod Copypasta as the explanation)?

→ Reply

Um_nik

4 years ago, # ^ |

+44

You mean someone should read these explanations? How cruel.

→ Reply

ouuan

4 years ago, # ^ |

+20

I think it's OK if someone could read them instead of should read them.

→ Reply

4 years ago, # ^ |

+10

Eh, it's a Chinese OJ, they're used to such work.

I don't mean checking if they're correct explanations, just a brief look that they seem like something written seriously. And if nobody wants to waste time on that... well, that's just the normal situation with no user feedback, nothing of significant value was lost.

→ Reply

andryusha_na_knopke

4 years ago, # |

← Rev. 2 →

+16

It wasn't in the comments exactly — https://codeforces.com/blog/entry/76129

→ Reply

PRESS_F_2_PAY_RESPECT

4 years ago, # |

+26

Just curious, did most problems lose rating? Because from what I can see on 1st and 2nd pages of problemset, I think all E and F problems of Div3 lost some rating.

→ Reply

4 years ago, # ^ |

+62

Mostly ratings not changed, but speaking about changes: problems mostly lost points than gained.

→ Reply

Jamaisvu

4 years ago, # |

← Rev. 2 →

+54

This problem (hard & easy version) have the same rating. What seems a bit weird.

1304F2 Animal Observation (hard version) 2300 x679
1304F1 Animal Observation (easy version) 2300 x895

→ Reply

4 years ago, # ^ |

I tuned some formulas and recalculated the ratings again. Now they are F1=2300 and F2=2400.

→ Reply

fivedemands

4 years ago, # |

← Rev. 3 →

+165

In Round 1305, 1305G - Kuroni and Antihype only had 1 solve from Um_nik while 1305H - Kuroni the Private Tutor had 2 solves from tourist and maroonrk. However, 1305G - Kuroni and Antihype had a rating of 3400 (3300 before) while 1305H - Kuroni the Private Tutor had a rating of 3600 (3600 before). I am very curious about what other parameters are used to calculate this problem difficulty.

600F - Edge coloring of bipartite graph had a rating of -1 lmao

→ Reply

4 years ago, # ^ |

← Rev. 2 →

+131

It seems I broke the formulas in case of no accepted submissions on the problem. Fixing it now, thanks for pointing it.

UPD: Fixed -1 issue.

→ Reply

venkat_46

4 years ago, # ^ |

-59

i have 6 demands who should i approach

→ Reply

hackinazi

4 years ago, # ^ |

+10

Trump

→ Reply

4 years ago, # ^ |

+19

Now both of them are 3500. Actually, I've implemented cutoff like rating = min(max(800, rating), 3500). I don't think ratings are very reliable for extreme cases like too small or too hard problems.

→ Reply

Honey_Badger

4 years ago, # |

← Rev. 2 →

Now I can see rating of any problem that I haven't solved yet, however I didn't choose "Show tags for unsolved problems" in my settings. As far as I remember, I couldn't see it before the update

→ Reply

david_varela

4 years ago, # |

+21

Thanks for the update.

There is a small bug in the problemset now. The table with the problems contains a header with 4 cells: # (id), name, lightning (rating), and green check (solved). If the problems of the table don't contain rating then the lightning dissapears from the header. For example, in the fist page when it's sorted by rating desc. When it happens, it's not possible to click on it to sort by rating asc.

→ Reply

4 years ago, # ^ |

← Rev. 2 →

+29

Thanks, I'll fix it.

UPD: Fixed it.

→ Reply

Agnimandur

4 years ago, # |

+21

...And there go my dreams of ever solving a 2000-rated problem in a contest....

→ Reply

Sixpathsguy

4 years ago, # ^ |

+53

Bro, your rating itself is around 2000, meaning on average you do solve 2000 rated problems in contest.

→ Reply

lrvideckis

4 years ago, # ^ |

+19

He's just being modest

→ Reply

I_love_Tanya_Romanova

4 years ago, # |

+222

Could you maybe share the code/formulas as well?

Unlike things like cheating detection algorithms, it doesn't seem to be something that could be abused.

I guess it will be beneficial both for answering all kinds of "how does it work?" questions and for getting input from participants about potential improvements.

→ Reply

manish_joshi

4 years ago, # ^ |

+18

MikeMirzayanov I am also quite curious about the problem rating formulas, and it would be great if they were public. It would allow high rated people from the community to improve them. Lots of problem recommendation bots/apps are using problem difficulties so it will be really beneficial if these formulas were the best possible.

Also, as somebody suggested below, It would also be a nice idea to formalize the rating calculation problem.

→ Reply

MagentaCobra

4 years ago, # |

+420

This is quite funny

→ Reply

meetpr8

4 years ago, # ^ |

+94

same problem!!! Div 3 round #527.

→ Reply

ashkANOn

4 years ago, # ^ |

← Rev. 2 →

Accepted submission for hard version should be considered for easy version too when only difference between two version is constraints. Because, one who solved hard version will never solve easy version in practice mode . This can make more submission of hard version than easy version.

→ Reply

meetpr8

4 years ago, # ^ |

Issue is with the ratings of the problems, not with the number of submissions!

→ Reply

ashkANOn

4 years ago, # ^ |

I think no of solve is one of the fact to calculate rating of the problem.

→ Reply

4 years ago, # ^ |

Now both of them are 2200.

→ Reply

falconlover

4 years ago, # ^ |

+24

This probably means that Mike has taken the order of solving into account as well, as some of the users solve F2 first and then submit the same solution for F1.
Same is the case with 1256F - Equalizing Two Strings and 1256E - Yet Another Division Into Teams as some people solved F earlier than E during the contest.

→ Reply

4 years ago, # ^ |

Now it looks better: F1=2100 and F2=2300.

→ Reply

ShafinKhadem

4 years ago, # |

+14

According to official standings, 821E - Okabe and El Psy Kongroo had 100 solves and 821D - Okabe and City had 34 solves. But 821E - Okabe and El Psy Kongroo has 2100 difficulty and 821D - Okabe and City has 2000 difficulty.

→ Reply

4 years ago, # ^ |

+10

Now D=2200, E=2100.

→ Reply

blobugh

4 years ago, # |

+30

It's good, but I think the rating of some easy problems rather became too overestimated. For example, in this contest, the rating of problem B was 1100 before. Now it became 2000. It's quite weird to see the problem having most solvers has highest rating.

→ Reply

.-._._-_.-_.-._-..

4 years ago, # |

I don't know where people usually report bugs (maybe, make a blog or something?) but I guess I've found one and I'll report it here.

You exceeded your quota of 2 distinct recipients per hour

I get this message when I try to send a text message in "Talks". I first sent a message to DeadlyCritic and then to pritishn. I didn't try texting any third person apart from these two. I only was replying to their replies to my initial message. In this case, why would it not let me send a message? The meaning of this error message isn't what it's trying to convey? Or is it a possible bug in CF codebase that doesn't let you respond to two different people within the same hour? Never faced this before when I was texting only one person (during an interval of one hour, not that I knew about this anyway).

→ Reply

Ihave4fish

4 years ago, # |

-29

65A.Harry Potter and Three Spells

It's not that hard, but the rating is 1800.

→ Reply

gagannagpal68

4 years ago, # |

+34

Mike, just a suggestion, What if you formalise the problem of rating assignment and let the codeforces community solve it for you ?

→ Reply

orange_trie

4 years ago, # ^ |

+14

or there could be two ratings , one by automation and one by community.

→ Reply

hhoppitree

4 years ago, # |

Oh, I think that will be good. That's sound great. I hope that day is coming soon.

→ Reply

suncongbo

4 years ago, # |

← Rev. 5 →

-48

1326F1 - Wise Men (Easy Version) I think it's difficulty is far below 2600 and far below 1326E(2400).

During the contest, I wrote a $$$O(3^nn^2)$$$ brute force bitmask DP without any thinking. It passed the system test outrageously. I think the difficulty of this method is at most 2000 (because it needs no thinking or observation). Maybe the low AC-ratio in contest is because many contestants didn't believe it could pass so they didn't write it at all.

→ Reply

Doritos4

4 years ago, # |

+23

20C

This is vanilla Dijkstra, should be < 1600

→ Reply

alireza_kaviani

4 years ago, # ^ |

+21

Also 757G - Can Bash Save the Day? is just (see spoiler) , ~130 users solved it , It should be < 2800 but it is 3700 now

Spoiler

→ Reply

-is-this-fft-

4 years ago, # ^ |

+35

But from the scoreboard we can see that no one solved it during the contest (ignoring the gray guys at the top of the scoreboard who "solved" it virtually). Probably there was not enough time to solve a tree query problem after also solvig the previous 6 problems.

For any algorithm that looks only at contest-time submissions all such problems are indistinguishable, so it makes sense to give them 3700. And I'd hate the idea of looking at post-contest submissions because they have a lot more problems. If a problem has simple but hard to come up with solution, then people will copy it from the editorial more. And some problems have just become famous, featured in some popular tutorials etc.

→ Reply

4 years ago, # ^ |

You have to factor in both in-contest and post-contest submissions or precisely this happens — you get a 3700-rated problem that definitely isn't that hard.

→ Reply

-is-this-fft-

4 years ago, # ^ |

+48

I'd prefer getting a 3700-rated problem that definitely isn't that hard. If you factor in post-contest, you will likely get some other rating that doesn't make sense.

→ Reply

4 years ago, # ^ |

There will always be some ratings that don't make sense, but why would it get worse by adding post-contest results? Look at the examples above, there are some serious nonsense ratings now.

→ Reply

alireza_kaviani

4 years ago, # ^ |

+28

In some of problems many of contest participants fail on system testing , But their solution is true and its really easy , and they will fail on some small case (n = 1) , solving this problem during practice is easy but their rating will be high.

→ Reply

https://codeforces.com/problemset/problem/887/F

4 years ago, # ^ |

Yeah, that shouldn't be part of problem difficulty rating. Neither should difficulty compared to other problems in a contest because they can be solved as just single problems in a huge problemset.

→ Reply

SPyofgame

4 years ago, # |

← Rev. 3 →

I come up with this idea:

How about the rating of a problem is consider by these ratio factors

→ Reply

-is-this-fft-

4 years ago, # ^ |

← Rev. 2 →

Determined how?

Also how do you count attempts? If someone tries to solve a problem and it's too hard, they won't submit anything. If a problem has an easy fake solution (that will trick people into submitting), a common trap or a tight TL that brings the number of attempts up. But neither of those things changes the actual difficulty of the problem. (Actually in my opinion number of attempts does not show much at all.)

→ Reply

SPyofgame

4 years ago, # ^ |

Yes my system I think there are also ones who you fake clone or wont try to submit. But I think the more participants who tried to solve the problem, the more accurate the system will be.

they won't submit anything.

My "attempt value" means the number of coder ones who tried to solve that problem whether it is accepted or not.

→ Reply

dreamoon_love_AA

4 years ago, # |

+40

I think the rating of this problem will be higher than 1600 a lot.

→ Reply

UncleGrandpa

4 years ago, # ^ |

Agree. I was floored when I saw this problem's rating to be 1600. Originally it was 2800. It was a pretty nice problem and I thought 2600 would be just perject but 1600 LOL

→ Reply

https://codeforces.com/problemset/problem/158/A

4 years ago, # ^ |

2500 now.

→ Reply

McDic

4 years ago, # |

+10

I guess age of problems and ratings of upsolvers are also used in rating calculation. That's probably why we can find some weird difficulty rating distribution on old problems.

→ Reply

valeriu

4 years ago, # |

1800???

→ Reply

blobugh

4 years ago, # ^ |

That problem had rating 1800 originally.

→ Reply

ShafinKhadem

4 years ago, # ^ |

How/Where did you find the previous rating of the problem?

→ Reply

blobugh

4 years ago, # ^ |

I didn't find that — just remembered that.

→ Reply

i think this problem does not deserve 1700 difficulty.

4 years ago, # ^ |

Now 158A=800.

→ Reply

Rudro25

4 years ago, # |

→ Reply

Agnimandur

4 years ago, # |

+11

I believe that all the problem ratings should be manually set by the problem writer.

This solution should remove the problem of Div 3 problems being overrated, and simultaneously of some problems being underrated.

Additionally, since the problem writer won't get credit for solving the problem in contest, he gains nothing by "boosting" the rating.

Additionally, Mike and others at HQ can always double check these ratings to verify their legitimacy.

→ Reply

manish_joshi

4 years ago, # ^ |

+11

No, that's a very bad idea according to me, the stats say the best about the problem difficulties. Authors can overestimate/underestimate problem difficulties because they can't see other approaches. However bad the formulas maybe, due to real data from the contest, they will always be better than what you suggest.

→ Reply

https://codeforces.com/contest/1307/problem/F

4 years ago, # ^ |

-10

Not always, e.g. if a lot of people try a problem because it has a lot of successful solutions, there's a runaway error effect that can easily exceed a reasonable author's estimate. It's just hard to say if this ever happens.

→ Reply

yashviradiya

4 years ago, # |

+16

Also, about rating, If we choose to hide tags for unsolved problems, it should show rating of the problem. It would be better if just problem tags are hidden and not the rating.

→ Reply

manish_joshi

4 years ago, # ^ |

It is already that way in the problemset page.

→ Reply

Fanurie

4 years ago, # |

This F problem from a div 2 contest has a rating of 1600. But I think the difficulty of the problem is greater than 2000.

not a pleasant surprise when you call the command ;gitgud -300 on discord :P

→ Reply

Lexi_kun

4 years ago, # |

Can you add minimize/maximize button on problem tags such that we could only see that if we wish to.

→ Reply

ahshafi

4 years ago, # |

-54

Am I the only one who wants problem ratings to be abolished?

→ Reply

ShafinKhadem

4 years ago, # |

+21

If anyone wants to view previous rating of problems, I had saved all problems' difficulty till 2020/1/5: link

→ Reply

NaimSS

4 years ago, # |

Does this problem really has 3200 rating? Seems like too much

→ Reply

chef_spam

4 years ago, # ^ |

Now 3300 :)

→ Reply

Rudro25

4 years ago, # |

← Rev. 2 →

[solved]

→ Reply

Exeedo

4 years ago, # |

+10

Have you considered a Machine Learning model to do the ratings? Obviously, you have lots of problems to train it.

→ Reply

4 years ago, # ^ |

What would you use as ground truth for training? Asking people to estimate the difficulty?

→ Reply

Exeedo

4 years ago, # ^ |

He is asking people to spot the bad ratings anyway. Might as well use them to train the model.

→ Reply

4 years ago, # ^ |

Might as well use that to set ratings manually. After all, the main limit is how much you're able to trust contestants' input, not the amount of this input — if you're putting full trust into that, let people vote for ratings. Any reasons why not to do that are also reasons to not use it as ground truth for training.

Neural networks are good if you want to generalise from, say, 100k user labels in a huge input space to (in the long run) even billions of automatic labels. Not so good if you want to generalise from, say, 100k labels for 10k different inputs to 200k labels for 20k different inputs.

→ Reply

s1dsq

4 years ago, # |

Easy version with 2100

Hard version with 1900

→ Reply

Pankin

4 years ago, # |

+10

Bruh according to the new ratings this problem is easier than this one from the same round, which is totally bogus.

→ Reply

GGAutomaton

4 years ago, # |

← Rev. 2 →

+16

Mike, I am the owner of codeforces.ml (another mirror site for Chinese users). I have tried many other ways (talks, emails...) to get in touch with you but in vain. I hope you can see this comment and check the email which I sent to your Gmail on Apr.4th. Thanks a lot!
I know this comment is not suitable in this blog, but I don't have other means. Don't downvote me so hard :(

→ Reply

4 years ago, # |

The problems 678C - Joty and Chocolate and 678D - Iterated Linear Function where both downrated to 1500. While C is fairly simple I was not able to solve D even after reading the editorial.

→ Reply

Juuzou-San

4 years ago, # |

← Rev. 2 →

I think it should be less than 1600 888A - Local Extrema

→ Reply

1244C - The Football Season

4 years ago, # ^ |

+11

Now 800.

→ Reply

chef_spam

4 years ago, # |

← Rev. 2 →

+14

I noticed that after this second round of changes, this https://codeforces.com/problemset/problem/1329/B problem is now rated 1700 (I think it was 1900 before), which is even less than https://codeforces.com/problemset/problem/1329/A problem (1800). However, it seems like more people solved 1329A than 1329B (I didn't explicitly count this though). Maybe basing purely off of solve count is not the perfect way to do this, but it is still reasonably accurate. I also think that the main reason 1329A is rated so high is because a lot of people (including some very high rated people) failed system testing on it, and that is not ideal because many people probably would have caught their mistake if the tests were full feedback, so this does not mean the problem is hard. Maybe the formula could be changed so that those who failed system tests are weighted more towards solving the problem or something?

(I also think that 1329B was a lot harder than 1329A, but my opinion isn't very relevant)

→ Reply

sevlll777

4 years ago, # |

← Rev. 4 →

I dont sure, but in my mind, this problem should be 1700/1800. I have a theory, that this problem has 2000 because of weak pretests. Does formula use FST number?

→ Reply

wifaw

4 years ago, # |

← Rev. 2 →

MikeMirzayanov Should this problem be rated 2900 ?? https://codeforces.com/contest/86/problem/D

It is standard implementation of MO Algorithm. Earlier its rating was 2700.

→ Reply

aryanc403

4 years ago, # ^ |

+29

"standard implementation of MO Algorithm."
Standard to what?
He told he uses only participants for rating calculations. Pretty sure MO wasn't standard back then. My reasoning behind this is most MO tutorials link this problem so this problem came before those MO tutorials. In the contest only 5 people solved it. Even reds failed to solve it.

→ Reply

gerard.onats

4 years ago, # |

+10

https://codeforces.com/contest/1183/problem/E (easy version) has 2000. https://codeforces.com/contest/1183/problem/H (hard version) has 1900. Both problems have ratings 2000 and 2200 respectively prior to re-calibration.

→ Reply

PRESS_F_2_PAY_RESPECT

4 years ago, # |

Are the ratings final now or still being changed?

→ Reply

4 years ago, # |

← Rev. 2 →

238A - Not Wool Sequences is rated as 1300.

From my point of view the difficulty is more near to 2300 than 1300.

Basically same is true for 286A - Lucky Permutation, it is rated 1400.

→ Reply

4 years ago, # ^ |

I just submitted 238A in 3 minutes after clicking your link, so I definitely wouldn't consider it 2300, haha (I generally find 2300 problems very tough).

I guess this also shows ratings are pretty subjective from person to person, and only valid statistically.

→ Reply

4 years ago, # ^ |

I worked on that problem two month ago, I kind of copied the solution from the tutorial back then, just did read the problem again...

And still be not able to understand it. Maybe it is bad math skills.

→ Reply

4 years ago, # ^ |

← Rev. 2 →

The way that I thought about it is considering all the prefixes' XORs instead. If you create a sequence of values for these, there's a bijection to sequences of the $$$a_i$$$ themselves.

If you look at constructing a sequence of prefix XORs, now it becomes as simple as "don't have any 0 and don't have any duplicates", so the answer is $$$(2^m-1)(2^m-2)\cdots(2^m-n)$$$ multiplying $$$n$$$ terms.

My submission: 86723498

→ Reply

4 years ago, # ^ |

For $$$a_1$$$ we may choose $$$2^m-1$$$ different numbers. For $$$a_2$$$ same restriction, but additionally $$$a_2!=a_1$$$, so there are $$$2^m-2$$$ posibilities.

Then $$$a_3$$$, same pattern again it must be different from $$$a_2$$$ and different from $$$a_1\oplus a_2$$$.

$$$a_4$$$ must be different from $$$a_3$$$, from $$$a_2\oplus a_3$$$ and from $$$a_1\oplus a_2\oplus a_3$$$ and so on. This sums up to $$$(2^m-1)(2^m-2)\cdots(2^m-n)$$$ if and only if all those therms are distinct. And it turns out they are. Why?

The tutorial constructs some b[], and says: "So we know that all elements of b should be different." How does this work?

→ Reply

4 years ago, # ^ |

Imagine $$$a_6 \otimes a_7 = a_3 \otimes a_4 \otimes a_5 \otimes a_6 \otimes a_7$$$. Then you can rearrange this to $$$a_3 \otimes a_4 \otimes a_5 = 0$$$.

In general if $$$a_i \otimes \cdots \otimes a_k = a_j \otimes \cdots \otimes a_k$$$ (where $$$i < j$$$), then you have $$$a_i \otimes \cdots \otimes a_{j-1} = 0$$$.

→ Reply