Codeforces: Soon We Will Change the Rating Calculation for New Accounts

4 years ago, # |

← Rev. 2 →

+81

Seems like a great update! Will the rating field in the CF API display the visible rating or the backend one used to calculate changes? I’m curious about whether this will break rating change prediction tools until their creators update them to account for this change.

Additionally, will the visible rating or the true rating be used to evaluate eligibility for Div 2/3/4 rounds? For example, suppose a new user jumps to 1700 true rating (thus, 800 display rating) after one round: can they participate officially in a Div. 3 round afterwards?

→ Reply

use_me

4 years ago, # ^ |

I don't know how but It'll be better to have just one rating.

→ Reply

MikeMirzayanov

4 years ago, # ^ |

+48

API will contain displayed ratings. Probably, good idea is to add one more field (say, trueRating) to API.

I think divisions should be assigned based on displayed ratings. It will be completely non-transparent to assign division by true ratings but show displayed ratings. I agree that it will make a few cases then a participant takes part in Div3 but shouldn't because of true rating. But I think it will not be not very often cases and "trusted participants" policy will hide them in most cases.

→ Reply

Mucosolvan

4 years ago, # ^ |

+116

Do you plan on somehow gating the rating gains when true rating is higher than division cutoff? It seems to me like someone can reach 2100/2200 by only participating in div4/div3 contests.

→ Reply

shahil_005

4 years ago, # ^ |

← Rev. 2 →

+52

Waiting for "How I went from 0 to 3000 in just 2 weeks".

→ Reply

snowflame

4 years ago, # ^ |

← Rev. 3 →

+38

What's happened? Vepifanov’s rating.

→ Reply

Juve45

4 years ago, # ^ |

+59

Any news about the API and the trueRating field? It would be very useful for many extensions, etc...

→ Reply

3 years ago, # ^ |

Sorry to bump this again, but any chance you could comment on the state of the trueRating field? As is, it's pretty difficult for rating predictor extensions, which are used by a significant portion of the community, to accurately evaluate projected rating changes.

→ Reply

HusseinFarhat

9 months ago, # ^ |

+14

MikeMirzayanov Sorry for necroposting, but is true rating going to ever come to Codeforces API? As prediction extensions can't provide accurate rating changes without this field, it's a very useful change

→ Reply

WasylF

4 years ago, # ^ |

+88

It should be a quite easy fix, I'm going to update CF-Predictor as soon as we got the "trueRating" field in the API response.

→ Reply

4 years ago, # ^ |

+18

Good to know, many thanks for maintaining this tool!

→ Reply

meooow

4 years ago, # ^ |

← Rev. 2 →

I don't think that is sufficient because now the displayed rating depends on both true rating and number of past rated contests.
It is not convenient to get the second from the API for users in bulk.

Edit: The number of past rated contests can be calculated from the difference between real and displayed ratings, so it would, in fact, be sufficient.

→ Reply

KharYusuf

4 years ago, # |

← Rev. 2 →

If new participants are discouraged in the starting due to lower rating points, then if they get negative delta in their first few contests then their rating will be <500, which will be much more discouraging.

I suggest initial rating to 500, in practice as well as during calculations. Instead lets balance rating inflation/deflation, so that that for the first few contests rating inflation/deflation is low, and then it gradually hits a stable value with more contests. I hope you get the general idea.

→ Reply

SuperJ6

4 years ago, # ^ |

← Rev. 2 →

+69

Bruh 500 is way too low. Some people started on cf after already having experience, and at least I don't wanna do 10 contests just to reach my actual starting rating. Even people with 0 cp experience but have programming experience are higher usually much higher than 500.

Also, I think it discourages you to care if you start with a super low rating too, like on atcoder I tend to not care about the contests at all since I haven't done many so my rating usually goes up no matter how much I mess around.

→ Reply

KharYusuf

4 years ago, # ^ |

Yeah, I get your point. But you will also see that there are very less fake/alt accounts on AtCoder.

Anyways your arguments seem fair too, but there will be some tradeoff :/

→ Reply

kafiulshabbir

4 years ago, # ^ |

But I think your rating will remain low for 6 rounds only. 500, 350, 250, 150, ... will be added to each round for free. New people will have to tolerate lowrating for 6 rounds only.

→ Reply

SuperJ6

4 years ago, # ^ |

I am not arguing against mike's system, as it combats this by having the calculation rating be higher, so it quickly converges. I was arguing against having the calculation rating also starting at 500, as was suggested.

→ Reply

kafiulshabbir

4 years ago, # ^ |

-13

No it will be 500 + d1. So it will appear to be near 500. Of course unparticipated will be 0.

→ Reply

SuperJ6

4 years ago, # ^ |

Did you even read it...

The rating of the new account will be equal to 0 for display (but it will be equal to 1400 when calculating rating changes)

→ Reply

t_suman

4 years ago, # ^ |

It is a good idea indeed..

→ Reply

GLAYS

4 years ago, # |

+180

→ Reply

John_Cena

4 years ago, # ^ |

+84

Create a new account and vote again

→ Reply

Rick_S

4 years ago, # ^ |

There must be some sort of glitch, I can't see your comment!

→ Reply

dv.jakhar

4 years ago, # ^ |

+16

Make a new account and see again

→ Reply

kingarp1914

4 years ago, # |

+294

While reading the rules I thought I am reading some codeforces A problem statement and I need to answer the final rating of a newly registered account.

→ Reply

r5shivamchaudhary

4 years ago, # ^ |

And 251 is the answer

→ Reply

techno_phyle

4 years ago, # |

+23

What about the old users who had participated in less than six contests but at least one contest participation? Do their rating change?

→ Reply

MikeMirzayanov

4 years ago, # ^ |

+37

I think if a participant is not rated yet, new rules will be applied. If a participant took part in at least one rated contest, old rules are applied.

→ Reply

techno_phyle

4 years ago, # ^ |

-8

Thanks Mike For Clarifiaction

→ Reply

LilyWhite

4 years ago, # |

← Rev. 2 →

+17

You can see my rating graph, it is a clear example of harsh rating losses for newcomers. It was a palindrome at the beginning!

→ Reply

nayeem2021

4 years ago, # ^ |

What if you see my rating graph?

→ Reply

minimum_guy

4 years ago, # ^ |

I can see the pain of struggler from your graph....

→ Reply

Abdo_Naser

4 years ago, # ^ |

→ Reply

AGRU

4 years ago, # ^ |

Same with my graph

→ Reply

gaurav1620

4 years ago, # ^ |

Same with me LMAO.

→ Reply

4 years ago, # ^ |

Check mine...

→ Reply

Miko35

3 years ago, # ^ |

← Rev. 2 →

（deleted）

→ Reply

bruceewayne

4 years ago, # |

Really happy to see this, you're right, I was also tempted to make another account, but I didn't lose hope and today I am 1500+ :D

→ Reply

Agnimandur

4 years ago, # |

This should have happened a long time ago!

Codeforces should look to Atcoder for inspiration for its rating system.

→ Reply

_AA_AA_

4 years ago, # |

-151

Connect the codeforces account with respective country's UID (Unique Identity) Number. Like there is AADHAR NUMBER in India.This might reduce number of fake accounts.

→ Reply

phyzzmat

4 years ago, # ^ |

+89

I'd say this is a little bit dystopian.

→ Reply

Agnimandur

4 years ago, # ^ |

+60

Yes force each account to also provide a credit card/bank account number to Mike to "guarantee no fake accounts".

→ Reply

Preeyadarshee

4 years ago, # ^ |

-30

Yes force each account to also pay money to Mike to "guarantee no fake accounts"XD

→ Reply

Nafis_Misfit

4 years ago, # ^ |

-34

Strongly disagree....I don't think that it will be a good idea because many students don't have any credit card and a need to add credit/debit card will decrease the number of new comers too.Because by adding credit card system you are demotivating many new comers even to get an experience of competitive programming

→ Reply

Agnimandur

4 years ago, # ^ |

-9

it is a joke

→ Reply

aryanc403

4 years ago, # ^ |

+15

Inb4 your subsidy got credited into Putin's swiss account.

→ Reply

Ahmadsm2005

4 years ago, # ^ |

I am 15 so I don't have a UID. Literally, a lot of OIers don't have UID. So you force all OIers not to participate and practice on codeforces? That is very rude!

→ Reply

4 years ago, # ^ |

Yeah, I kinda agree with that.

→ Reply

ZED_

4 years ago, # |

← Rev. 2 →

First of all, thanks for this step !!

I have a question in mind . What is the approximately maximum value of d1+500 + d2+ 350 + ...+ d6+ 50. If this value is always less than 1400 then there is no problem .

But , Suppose this value exceeds 1400 in the 4th round ( d1+d2+...d4) . and suppose the participant's rating is 1500 . Then still he will be treated as 1400 rated and will cause serious rating instability.

→ Reply

Perry_Prasad_Platypus

4 years ago, # ^ |

I think maybe they will be rated according to 1400+d1+d2.. rating and not their shown rating until 6 rounds

→ Reply

Tekor

4 years ago, # |

What about graph of rating?

→ Reply

MikeMirzayanov

4 years ago, # ^ |

+25

They will show displayed ratings. So for new participants most graphs will show increasing at the start, generally.

→ Reply

Harshil_

4 years ago, # |

-31

After these changes are done.

→ Reply

riadwaw

4 years ago, # |

+210

Apparently, a similar idea is used on AtCoder, but I don't know the details.

It would make sense to research prior state of the art before creating one's own implementation

→ Reply

A-m-z

4 years ago, # ^ |

+58

i think he probably did and is just saying that so it doesn't count as copying lol.

→ Reply

Chilli

4 years ago, # ^ |

Low novelty, doesn't compare to previous state of the art baselines, strong reject.

→ Reply

stefdasca

4 years ago, # |

+13

Will that new person participate at rated contests based on his/her displayed rating or based on the real rating?

If the former happens, one can get a really high rating from participating at Div3 and Div2, which is not fair.

→ Reply

4 years ago, # ^ |

+18

It seems (from Mike's response to my message above) that eligibility will be determined by displayed rating.

→ Reply

riadwaw

4 years ago, # ^ |

+16

So, this is exactly what may cause the problem. So if you have big real rating and small displayed rating, if you win, you'll get your real rating change and additional boost to your displayed rating, so you will get higher then beofre

→ Reply

dassi

4 years ago, # |

-22

Suggestion: is it also may be possible to make the first few contests(say 3) unrated for the new users. This will help the new users to actually get accustomed to the style of contests here and will decrease the number of fake accounts

→ Reply

srikar777

4 years ago, # ^ |

+10

suppose a smart person decided to join CF, gets top ranks, sees that he needs to write 3 unrated contests to get rating .he would never come back

→ Reply

A-m-z

4 years ago, # ^ |

Virtual contests...?

→ Reply

Utsav_goyal

4 years ago, # |

+11

like div1 + div2 rounds will we see div2+div3 or div3+div4 in the future because it looks like in future div4 contests frequency is going to increase?

→ Reply

_BeginnerCoder

4 years ago, # |

-28

Can anyone briefly explain what Mike has written? I'm too stupid to understand the technical intricacies involved

→ Reply

bobbilyking

4 years ago, # ^ |

← Rev. 2 →

every contest you gain a little bias (and the sum of those biases over 6 contests is 1400), but you participate as a 1400 anyways. essentially after 6 rounds it acts like you started at 1400 rating and all rating fluctuations from the past 6 contests (that act like you had 1400 to begin with) apply after that too

→ Reply

eugalt

4 years ago, # ^ |

+11

You need to take a third grade math class first.

→ Reply

alireza_kaviani

4 years ago, # |

← Rev. 2 →

+24

The upcoming Div.2 round is the last chance for us, new reds to create as many alts as we can in order to take part in the following Div.1 contests with ease

→ Reply

-this-is-obd-

4 years ago, # ^ |

orz

→ Reply

DeadlyCritic

4 years ago, # ^ |

+1, about 2-3 weeks and no D1 ╯︿╰

→ Reply

demoralizer

4 years ago, # |

+78

There will be one absurd thing about this though, if you don't solve any problem in your first contest, your display rating will still increase by around 350. I believe that would look absurd.

I believe it would make more sense to keep the display rating locked at 0 until a user has solved at least one problem in an official round (and then add all of the pending "promotions" in rating at once or maybe delay the promotions).

→ Reply

antguz

4 years ago, # ^ |

+19

I don't really see the problem here.

If you don't attempt any problem, it doesn't count as a rated participation anyway.

If you do attempt and don't manage to solve anything, you'll likely get a huge negative delta, which will balance out the displayed rating bias increase.

→ Reply

I_love_Tanya_Romanova

4 years ago, # ^ |

+16

I recall how in the past it wasn't unheard of to have your TopCoder rating increased after scoring 0 :)

While the situation that you described may not sound right intuitively, I don't see a good solution that wouldn't add unnecessary complexity to the system.

→ Reply

Norrius

4 years ago, # |

+230

I think 6 contests to get to your true rating sounds fair, and more is definitely unnecessary. AtCoder's system is a bit unfortunate in that the rating seems to climb really slowly, which means mine still hasn't converged after 13 contests (I'm 100 points lower than my contest performance would indicate). It's mildly annoying and I'd prefer to avoid that.

Just for fun, last week I made a comparison of rating vs. percentile across different competition platforms. Notice how AtCoder's median is 165 (yes, mid-grey) which doesn't look reasonable to me.

(X-axis is exponential, the data might not be 100% accurate, CodeChef includes all users and not only active).

→ Reply

kafiulshabbir

4 years ago, # |

-36

But I suggest the beginning imaginary rating be 1500 instead of 1400. So as you said new accounts will contribute less to rating. The existing specialists and experts will find more difficult to gain rating. So someone who got 1700 before the start of this rule got is much easier than someone will now.

→ Reply

shadaBhai

4 years ago, # |

← Rev. 2 →

Will this be also applied for those who already registered but didn't participate in more than six contests? Or will it be applicable for only those who will create an account after the implementation of this?

→ Reply

theif_coder

4 years ago, # ^ |

Your question is answered here.

→ Reply

ZzaizZ

4 years ago, # |

+14

Rating is just a number. The main raiting — is your knowledge. (but you still should not look at my rating history)

→ Reply

majk

4 years ago, # |

← Rev. 3 →

+75

How it's done on AtCoder:

$$$rating_{shown} = rating_{true} - f(competitions)$$$, where $$$f(x)$$$ is a magic function that starts at $$$1200$$$ and tends to $$$0$$$ as $$$x$$$ tends to infinty.

What you did is equivalent, but with $$$f(1) = 900$$$, $$$f(2) = 550$$$ ... $$$f(7) = 0$$$.

→ Reply

Sadat999

4 years ago, # ^ |

Is that magic function not disclosed?

→ Reply

majk

4 years ago, # ^ |

+10

It is, follow the link AtCoder's Rating System on atcoder.jp to find all the details.

→ Reply

dbest077

4 years ago, # |

@Mike, I liked your idea of calculating the rating. One thing that I would like to share with you guys — Like if a guy has a rating of 1599 and participates in Div 3, goes to 1700+ , guy with 1600 can also do the same(not comparing any one). Can this be taken into picture "IF YOU CROSS A DIV — CAN IT BE LIKE Min(DIV X + 50, new Rating). Ps — Just a THOUGHT. (No personal issues).

→ Reply

DragoPhoenix

4 years ago, # |

We may see marked increase in participation in the upcoming Div2 round as people who are confident enough might not want to wait for 5-6 round to get to their deserved rating. Also better start at 1500 than 1400 if you feel you stand somewhere in between.

→ Reply

Parvej

4 years ago, # |

Great initiative!

→ Reply

djm03178

4 years ago, # |

+150

This may be quite unrelated to this post, but I think this adds another reason to suggest this now.

I think we need an up-to-date version of this 9-year old FAQ. A big portion of this post contains old information which doesn't correspond to how the system works nowadays. It was already too much out of date when I first joined CF, and now it has even more complicated features like Div. 3 & 4 and this modified rating system. Even though I really like this new rating system in general, this may make newcomers a lot more confused about their rating changes if they can't find this information easily.

→ Reply

-is-this-fft-

4 years ago, # |

+117

Maybe it would make more sense (in that the system might be cleaner) to rethink the entire rating system (that includes round divisions etc) rather than to add features to the existing system.

→ Reply

Scopula

4 years ago, # ^ |

+12

I think any radical change will make many veterans unhappy.

→ Reply

yashmittal19

4 years ago, # |

It is really good idea.It would definitely stop multiple accounts formation and also would be nice to boost up confidence of new participant

→ Reply

dywbasm

4 years ago, # |

← Rev. 2 →

-8

If i have less than 1600 rating, but more than 1600 true rating (1400 + $$$d_1$$$ + $$$d_2$$$ + ...) . Is div3 rated for me?

→ Reply

TwentyOneHundredOrBust

4 years ago, # |

+94

So basically now you can get to master just by hitting top scores on 3 div3s in a row. Or maybe div4s (why on earth is that even a thing when it was what div3 was meant to be). The whole system just keeps getting more and more broken lol.

→ Reply

aryanc403

4 years ago, # |

← Rev. 3 →

+127

If someones actual rating becomes more than 1600. Just give him his actual rating instead of waiting for 6 contests. The current system will just increase inflation with 2200 and 2300 users/alts participating in div2s. Becoming GMs using div2s only.

→ Reply

Dontony

4 years ago, # |

Very good step.

→ Reply

DucPro

4 years ago, # |

← Rev. 2 →

+33

Assigning contests based on displayed rating? This is opening up a whole new can of worms here. If it ain't broke, don't fix it.

→ Reply

daxpatel11

4 years ago, # |

It sounds perfectly correct . This will definitely demotivate to create new accounts as they create when their ratings drops so much. Best Luck to participants for the next contest rounds. Wish all high rating!

→ Reply

adderall

4 years ago, # |

Yeah but can you wait for like 3 more div 2 contests ? Gotta farm those filthy rating and get yellow before deflation hits.

→ Reply

WYXkk

4 years ago, # |

Oh,no new accounts become Master in just 3 rounds then.

I don't know if it's a good thing,in my opinion it's more likely to be,but time will tell.

Hope high rating for new comers

→ Reply

happyabc

4 years ago, # ^ |

Flakire can get Master in only 2 rounds. So orz Flakire.

→ Reply

happyabc

4 years ago, # ^ |

In fact,you can find a legendary grandmaster in 10 rounds.

→ Reply

4 years ago, # ^ |

-8

Just look at zyy. @WZYYN

→ Reply

pritishn

4 years ago, # |

+72

Apparently, a similar idea is used on AtCoder, but I don't know the details.

→ Reply

rui_er

4 years ago, # |

That's a great idea to give the new accounts more confidence. For me, it's really terrible if you work really hard but your rating just drops.

→ Reply

spookywooky

4 years ago, # |

I like the way it works in AtCoder, there my rating went down the first time after 7 contests. Assuming it will work here similar it is a good change.

Still the problem that top 10 of Div3 contests make a huge jump in rating, but that is another story.

And I very much apreciate that new accounts become less atractive.

→ Reply

playerIunknown

4 years ago, # |

Hey..i am a school student and i just got introduced to codeforces. i took the above Div 3 challenge yesterday and managed to solve 2 problems, post that my laptop got shut unfortunately. However, both the submitted questions were accepted in the first test as well as post the open hack and system test session. But i haven't got any updates regarding my performance or ratings. I am new to this. Please tell me what am i supposed to do?

→ Reply

spookywooky

4 years ago, # ^ |

+13

Just be patient, the ratings will be updated soon. It takes some time.

→ Reply

aberent

4 years ago, # ^ |

If you can't be patient, and are using Chrome, add the Codeforces Enhancer to Chrome. It will (among other things), give you an estimate of your rating change in the standings table.

→ Reply

saurabh247

4 years ago, # |

People with new account are becoming master and candidate master. That does not seems to be fair imo. This could promote people to make new account.

→ Reply

WaterColor2037

4 years ago, # |

Where can I get the comprehensive information about how the ratings are calculated in CodeForces by the way?

→ Reply

Sad_reacts_only

4 years ago, # |

MikeMirzayanov I think there are few bugs in the new rating system as many people who gave their first contest yesterday have become candidate master with just 2-3 problems solved.

→ Reply

navidnouri1992

4 years ago, # ^ |

Now, for some reason the initial ratings are set to be zero. The starters should start with a rating of 1400, right?

→ Reply

Sad_reacts_only

4 years ago, # ^ |

-7

As per this blog you initial rating will start from 1400 but you will see it from 0. It's explained properly in this blog i think.

→ Reply

navidnouri1992

4 years ago, # ^ |

Yes, you are right, I misunderstood it. Thanks.

→ Reply

what-the-duck

4 years ago, # |

-37

I have observed that problems in CF Round #644 (Div 3) were quite easy. Except the last, all problems had 2000+ successful submissions during the contest. Due to this, the rating of participant in 1400-1600 exploded. Adding more to this, new accounts became masters just by solving 2 problems last night. LOLzz.

This might be due to new rating system. However, this completely destroys the motive of the new rating mechanism, that was to reduce creation of new accounts.

I propose two points after seeing what happened:

Withdraw the new rating system.
Make prev. round unrated for participants who were in 1400-1600 before the contest. As the problems were indeed suitable for Div 4 contests only.

Comment if you agree/disagree with my proposal.

→ Reply

Y_is_sunshine

4 years ago, # |

Maybe it's a good choice, but we may need time to adapt to it

→ Reply

Jarjit_sigh

4 years ago, # |

bruh wtf im mad bro sir i lost my exopert im sad bro

→ Reply

RayPatterson

4 years ago, # ^ |

-14

Mac I schadenfreude in your face.

→ Reply

Nodirbek_Sattorov

4 years ago, # |

← Rev. 4 →

+17

MikeMirzayanov I think there are still a few problems look at Droping_points rating. He/sh's solved all problems but the rating was not enough added.

→ Reply

tiger0817

4 years ago, # ^ |

In the past, someone always try to droping points and then get high rank in div2/div3 to get high rating,

maybe the rating they get is not as higher as usuall, but I think it's good to increase single score growth, maybe 500 or 400 ? And maybe it's good to make different rating has different maximum score growth, maybe we should make someone who has high level to reach the rating they worth soon. In Atcoder you can get more than 1000 points in your first contest

→ Reply

zer0_iq

4 years ago, # ^ |

← Rev. 2 →

It appears to me that rating changes for the users who were rated before the previous Div3 round have been a bit unfair,as is evident in the case of Droping_points . @mikemirzayanov it would be better if you can make the rating calculation rules a bit more clear..as apparently someone with a negative initial rating before the round got a delta of +300 despite finishing 3rd in the standings table whereas some unrated user who came last in the standings table also got a delta of +300,which seems a bit harsh to me.

→ Reply

ayonsen

4 years ago, # |

I have given seven contests till date and in each one my rating has only fallen, even though ive solved at least two problems in each contest(apart from one). Its quite disencouraging tbh. Still waiting for even a slight increase in my rating.

→ Reply

4 years ago, # |

I'm a bit confused with the rules apparently. Will the ratings for other users decrease if starting a new calculation from zero?

→ Reply

RayPatterson

4 years ago, # ^ |

+16

It starts from 1400, but yes it will cause a decrease in rating for others

→ Reply

4 years ago, # ^ |

Ah... I don't really like this...

→ Reply

UncleGrandpa

4 years ago, # |

I think with the fast growing number of users, even 1400 will still make the rating inflated at a super high rate. As for most newbies, I think a starting number of 1000 or less is more appropriate(It doesn't affect the rating showed anyway)

→ Reply

truth_be_told

4 years ago, # |

Guys I was curious about one thing, but I don't want to spam Codeforces with a new blog so asking here... I am seeing some high-rated guys from my country being online (almost) continuously. They aren't submitting anything most of the time, not even in practice(except in a few days). So what are they doing? Is this some kind of browser connection bug? I have no interest in their life, just want to satisfy my curiosity

→ Reply

Hiruem

4 years ago, # ^ |

It's possible that they are doing mashup contests, and since they are private others can't see their submissions

→ Reply

Robin_Sparkles

4 years ago, # |

+42

Just remove div3 and div4.

→ Reply

RayPatterson

4 years ago, # ^ |

Lmao he speaks the truth here.

→ Reply

anodaram

4 years ago, # |

-29

I wish Codeforces's own formula.

I like Codeforce the most. So I disagree with following someone else's formula.

There is no need to change the formula.

→ Reply

adderall

4 years ago, # |

+21

A not-so-fair-but-effective solution to this

→ Reply

4 years ago, # |

← Rev. 2 →

+130

What is the point of this update? I mean, as far as I remember, CF uses some kind of Elo rating system (one of Elo, Glicko, Glicko-2, etc.), where the starting rating is just an additive constant which does not affect any computations (just because 1500 rating looks prettier than 0). So, in fact, this update adds 100 additional points to all old profiles, and therefore breaks whole rating system, because these numbers mean nothing now (literally nothing, +100 points for reds is not the same as +100 points for blues). And moreover, it does not even solve the problem of one-contest profiles, just moves it 100 points lower.

Btw, AtCoder uses its own crazy rating system, which is definitely not from Elo family, so everything above does not apply to it.

Besides the implementation, I agree that current rating system is not the best for CF purposes and maybe should be changed, but it requires a lot more research.

→ Reply

dpaleka

4 years ago, # ^ |

The starting rating matters, because it influences whether the total rating mass of active users is decreasing or increasing. Thus, lowering the starting rating is a step to combat rating inflation.

(If most of the new accounts are overvalued in rating compared to the old accounts, then the new equilibrium is with higher old ratings.)

→ Reply

4 years ago, # ^ |

← Rev. 3 →

+49

No, it does not. That is how Elo works. -100 to all new users is EQUIVALENT to +100 to all old users. If you decrease starting rating, then all the ratings are decreased by the the same constant. If you decrease starting rating not for all users, then all the ratings are still decreased, but now it is unconverged. Just because in Elo starting rating stands for average player rating. 1500 is just a some nice number which is used as a legacy.

With the same success we could change factor 400 (stands for variance) in recalculation formulas. It won't change anything. Moreover, after some time, when sufficient amount of 1400 users are created, we will have the same problems, but -100 rating.

As a direct consequence, new rating numbers will mean nothing. For old rating I could say that 2300 player should win 2100 with probability 75% and that 1600 player is roughly top33%, but now it is impossible. It destroys all the good properties of Elo!

That's why I think that 1400 is a stupid idea.

P.S. The second reason to hate CF (after the killing of the rating system) is the button "Cancel" in the usual place of the "Send" button

→ Reply

dpaleka

4 years ago, # ^ |

What? One of us is misunderstanding the new rating update. I thought the update is essentially just making the new starting rating 1400 instead of 1500.

The accepted answer here says something about the effect of starting rating values...

→ Reply

4 years ago, # ^ |

← Rev. 2 →

+16

Yes, I also think that the update changes starting rating to 1400 instead of 1500 (for new users).

The accepted answer also correct and perfect. But, what the answer says is two things:

When you add something to starting rating globally (for all players, not our case), it just adds this constant to all ratings and does not affect anything else.
When you merge ratings of skilled group of players with average players, it is a good idea to add some constant to the ratings of skilled group to make Elo converge faster.

Second one is also not our case, since there no reasons why users who registered after May 25 should be less skilled than those, who done it before. The idea of 1400 update, as I understand, to deal with rating inflation caused by one-contest players, but it is the exactly the case where it won't help. Unless we repeat it every year. Then probably it will help to maintain reasonable number of LGMs without changing division borders. But in this case we will be unable to estimate anything from this rating numbers (e.g. winning probability), which is worse IMO.

The better way to deal with the same problem is, e.g. to calculate rating updates using only the players with at least x contests.

→ Reply

Jarekczek

2 years ago, # ^ |

That is how Elo works. -100 to all new users is EQUIVALENT to +100 to all old users.

This is wrong, in general. It is true only in a special case, where ratio old users / new users is close to 0. That is much more new users came, than was in the pool.

Actually the rating assigned to new users should be a good guess of their average rating. If you guess it well, you will notice no inflation and no deflation. That's basically what dpaleka said.

Let's have a look at a basic example, that should make all clear. At least for me it did :)

We have 3 players, A1 with rating 220 and B2, B3 — rated 100. Their average rating is 140 (420 / 3). Now let's assume a new user (which is of skill level B), and we assign him incorrect value 140. What will happen after longer period:

A1 = 230, B2 = B3 = B4 = 110. Their sum is 560 (420 + 140), and difference between A and B is 120, as before. This is because of ELO properties: the sum must stay the same and the difference in rating reflects the probability of A winning with B.

So what has happened? Our mistake in judgement (40 rating points) has been evenly distributed between all players, resulting in inflation of 10 points.

So it's hard to judge whether 1400 is a good change. If CF owners evaluated all new users in last 6 months and noticed that their average rating after 6 contests is equal to 1400 — then it's great. It will mean stable rating system as long, as average rating of newcomers stays at 1400. The greater the mistake in judgement, the greater inflation/deflation will occur.

→ Reply

Retarded_Ape

4 years ago, # |

+20

When I started comp. programming I suffered from the same issue. I feared my rating would drop and didn't participate in contest, rather solved problems later. Although, I now understand that rating is just a depiction of your skills and their was no point of doing what I did, I really appreciate this step and it will be really in making sure that the ultra-beginners don't get demotivated in the starting itself.

→ Reply

hetp111

4 years ago, # ^ |

Whats wrong with ur graph ? https://i.ibb.co/xXLvvPQ/Capture.png

→ Reply

Retarded_Ape

4 years ago, # ^ |

I was trying to figure out the same, That's happening with my account only :(

→ Reply

ChenKaifeng

4 years ago, # ^ |

mine is broken too.....

→ Reply

dalex

4 years ago, # |

← Rev. 2 →

+83

Just do what we have in chess. No rating for first 10 contests, and after the 10th contest you get your rating = average perfomance in 10 contests.

Before that, when N < 10 contests have been played, show approximate rating = average perfomance in N contests in the profile.

Perfomance is "if you had this rating, it wouldn't change after the contest".

→ Reply

dywbasm

4 years ago, # ^ |

So if i get the first place in some contest, my performance will approach infinite?

→ Reply

dusty

4 years ago, # |

+20

Guys, we need to update the СF predictor, because it will show the wrong rating.

→ Reply

eugalt

4 years ago, # |

← Rev. 3 →

Where have the comments gone?

_BeginnerCoder replied to your comment

→ Reply

eugalt

4 years ago, # ^ |

Let me restate.

The following comment of _BeginnerCoder — Can anyone briefly explain what Mike has written? I'm too stupid to understand the technical intricacies involved — has disappeared along with all three of its descendants. Does anybody know why?

→ Reply

_BeginnerCoder

4 years ago, # ^ |

Yeah, it was deleted by Codeforces. Now we both will get some notifications that'll include a warning to BNBR else face account suspension. Walking on thin ice I tell ya!

→ Reply

..OvO..

4 years ago, # |

+13

Don't think it is a good idea!!!

→ Reply

WitchOfTruth

4 years ago, # |

Separation of the real rating and the displayed rating does not seem right to me You could just drop the initial rating to 1000 or 0 or whatever would solve the problem

→ Reply

R.aZ.e

4 years ago, # |

← Rev. 3 →

The current system is much better.

For me when i started codeforces, I had almost no idea about cp and knowing just some basics of C I was in the same position of rapid rating loss, but I think those initial rating drops made me work harder and improve myself, had it been the new rating system I might not have given my best because I HAD NOTHING TO LOSE, reason for that may be: even if I perform bad my rating will increase initially.

I don't mean disregard the new method by any means :D

→ Reply

Mcayitel

4 years ago, # |

Can anyone tell me why we new users are not assigned 0 rating? So if he solve a single question he will get motivated.

→ Reply

curly_braces

4 years ago, # |

Apparently "/ratings?order=BY_RATING_CHANGE_DESC" isn't interesting anymore.

Great initiative though.

→ Reply

Thomasguo666

4 years ago, # |

← Rev. 4 →

[deleted manually]

→ Reply

theartofwar

4 years ago, # |

← Rev. 2 →

why there is no strict action against plagiarism.

→ Reply

Xellos

4 years ago, # |

← Rev. 2 →

Good idea.

For (not that applicable) comparison, AOE2:DE uses several ranked ladders such that if you have good rating in one, it's easy to get good rating in another in a few games, and it requires you to play 10 games before you're visibly ranked — before that, only you can see your hypothetical rating.

→ Reply

4 years ago, # |

← Rev. 7 →

+70

FYI, I finally made a decent write-up of my contest rating algorithm (at least the first 8 pages of it): https://github.com/EbTech/EloR/blob/master/paper/EloR.pdf (2021 update: https://arxiv.org/abs/2101.00400)

Here's how everyone on Codeforces ranks according to it, including the performance and rating change of their most recent participation. Max ratings are in parentheses: https://raw.githubusercontent.com/EbTech/EloR/master/data/codeforces/CFratings.txt

The displayed rating is $$$1500 - 2(\sigma - 100)$$$, where $$$\sigma$$$ starts at $$$350$$$ and approaches $$$100$$$ as members become more experienced. So, the displaying rating is initially $$$1000$$$. That's a "penalty" of $$$500$$$, which shrinks to about $$$207.9$$$, $$$120.3$$$, $$$76.6$$$, $$$50.9$$$, $$$34.6$$$, $$$23.8$$$, $$$16.5$$$, $$$11.5$$$, $$$8.1$$$, and $$$5.7$$$ after competing ten times.

My system is a rigorous generalization of Elo/Glicko to the case of large numbers of simultaneous participants. It reduces inflation (but doesn't eliminate it), reduces the influence of new members, limits the size of rating jumps for experienced contestants, and produces (I believe) a more reasonable distribution of ratings (see the table here for a comparison). The math does all this naturally, without special hacks.

As a bonus, I recently optimized the implementation so that the entire history of Codeforces can be simulated on my laptop in 25 minutes. I haven't run many experiments or detailed comparisons, but I'd gladly assist anyone who wants to try!

→ Reply

I_love_natalia

4 years ago, # ^ |

← Rev. 2 →

-16

I made some random checks and I think your rating is inflating as bad as current CF one. Have you tried to model CF competitions and make computational experiments?

P.S. Oh sorry, I read like every second paragraph. Start with the following: assume every user score like Gauss(x, x) distributed score for some x. Every round is participated by half of users. How many rounds will you need to stabilize the rating? I think it's like 50 or 100 for current CF with 1000 users.

→ Reply

I_love_natalia

4 years ago, # ^ |

I will translate for you some notices about the origin of inflation that were made in Russian.

The knowledge growth causes a lot of problems as time spent on training is reduced when user become less active.
Two divisions without overlap model causes inflation of div1 and deflation of div2 as it provides one-way invariant transfer through the boundary. This effect shouldn't be visible when CF rating was initially expanding (before 2017 or so I think).
If we peek top 10% or top 25% users before round, their rating sum after round should be less than before round. This is an effect caused by selection (top 10% is more likely overrated than underrated).

→ Reply

4 years ago, # ^ |

+32

The current CF system might have less inflation due to recent anti-inflationary measures, which I considered to be somewhat of a hack. Compared to the CF system without these measures, Elo-R seems to have less inflation based on looking at the rise of top-rated members over the years... but I admit this is not a rigorous experiment.

I suppose it would be helpful to create a testbed with various rating algorithms, so that we can easily compare them along each of the criteria you mention. I can't spend too much time on this project while working alone, but I might slowly try some of these things. It's open-source if others want to chime in with their own experiments. And since my system has a theoretical framework, we can change its assumptions to better match reality if needed.

Thanks for the discussion and experiment ideas!

→ Reply

4 years ago, # ^ |

← Rev. 4 →

+29

For a very simple experiment, I just tried running my algorithm with an initial $$$\sigma$$$ of $$$250$$$ instead of $$$350$$$. This changed the mean true rating of all Codeforces users from $$$1311$$$ (when $$$\sigma=350$$$) to $$$1415$$$ (when $$$\sigma=250$$$). That's a big difference! Most of this seems to be due to the weakest members not having converged, since they participate less. Nonetheless, the top ratings also increased, by 14 to 19 points.

I suspect that two things are happening:

When new users join, $$$1500$$$ is usually an overestimate of their skill, so they lose more than the system expects, giving everyone else extra rating points. This is an inflationary pressure.
As users compete more, they usually improve and become better than their rating, so they win more than the system expects, taking away rating points from others. This is a deflationary pressure.

By using a high starting $$$\sigma$$$, we can drastically reduce the first effect, altering the balance between inflation and deflation.

Edit: with $$$\sigma=100$$$ (where it would stay fixed), we get a mean Codeforces rating of $$$1511$$$. The top ratings... are actually less than they were with $$$\sigma=350$$$! I suspect the population hasn't converged yet. A high starting $$$\sigma$$$ accelerates convergence of the population rating distribution by allowing it to spread right away, from Round #1. With low $$$\sigma$$$, my system might become even slower than others since it puts less weight on surprising performances (outliers).

→ Reply

4 years ago, # ^ |

+18

What is mean rating? As far as I understand (I haven't read your paper yet, sorry), generalisation of Elo to the case of large numbers of simultaneous participants should be still zero-sum, i.e. maintaining constant mean rating. And if I remember and understand it correctly, the thing you are reinventing is TrueSkill (apart from the displayed rating feature, which is nice, but not so important now).

And, as some kind of a bonus, TrueSkill could be generalised to the case of team competitions (many thanks to S. Nikolenko for pointing it out to me), which is sometimes our case (and I do not see how you do that).

I didn't find an implementation of this rating system and didn't tested it on CF history, but I think that I'll try to do it next week if it would be interesting.

→ Reply

4 years ago, # ^ |

+18

Maybe I should write an explicit comparison with TrueSkill, after studying the Gaussian density filtering algorithm more closely to understand its properties. The models are similar, but I'll point out differences:

My model uses logistic distributions instead of Gaussians: a major theme of my paper is to look at the implications of this.
TrueSkill has a model for draws, which I think is inaccurate for programming contests. Elo-R doesn't model draws, but instead treats them as half a win + half a loss, similar to Elo. Actually, if we model draws as nearly-equal performances, the Elo-R math gives a result equivalent to 1 win + 1 loss, but it seemed strange to me to give draws double the weight...
I appeal to the limit of large numbers of contestants to allow for an efficient and easy computation of performance. As a result, I get formulas that are more Elo-like and human-interpretable. Many properties of Elo-R can be deduced just by looking at the formulas, even without the theoretical model! (The formulas give reasonable results for small numbers of contestants too, although the theory no longer justifies this.)

Mean rating is the average "true rating" among all users, including inactive ones.

My system is most directly inspired by Glicko. It can be made more Elo-like by keeping $$$\sigma$$$ constant, as happens to be the case in my $$$\sigma=100$$$ experiment. In that case, the mean rating stayed very close to $$$1500$$$, drifting $$$11$$$ points off due to the approximations involved. I believe a system which preserves the mean will be inflationary: as Codeforces becomes more popular, it's likely to welcome a lot of weaker members, so we should want the average rating to decrease. Perhaps the top should be allowed to increase very slowly, as the state of the art improves.

I believe Elo-R can be extended to team matches. Thinking about it very briefly, we could compute performance scores for each team, by modeling team performance as a sum or max of random draws for each member (which amounts to one random draw per team, with a different distribution). Then, knowing the team performance, we have to reserve-engineer a measurement of each team member's skill.

→ Reply

4 years ago, # ^ |

+16

In short: I spent last $$$x$$$ hours studying the improved variant of TrueSkill and think that it is exactly what CF needs.

It is not the complete list of reasons why I think so, but at least the most important ones.

TBH, I don't see why logistic distributions are better than Gaussians. IMO, the main reasoning behind the Gaussians in all Elo-like models is CLT and I can't find any similar argument for logistic distributions.
The model for draws in TrueSkill is perfect for programming contests! Consider players' points as a rough measure of his performance and get exactly this model of ties. The case of multiway ties (which is one of problems of the original TrueSkill) was fixed in its generalised version from the second link. And I can't say that on CF ties are not very common.
Rating inflation. What we exactly need from the model is the correspondence between rating number and percentile (like, if your rating is at least $$$x$$$, then you are in top $$$y\%$$$), which is exactly what TrueSkill does and what is seems to be impossible in your model (correct me, if I am wrong here, I think so because of non-constant mean rating). Of course, any model of this kind should have some rating inflation, but we don't need exactly ten LGMs, we want to have number of LGMs proportional to the total number of players. Moreover, when we say that current rating system has problems with rating inflation, we generally mean that average div.1 user rating is growing too much because new users first contests influence the rating a lot more than it should be. But in theory, if we don't have a multiple account problem (which the visible rating feature should take care of), any model with non-constant $$$\sigma$$$ should do the trick.

Surely, we need to test and compare our models to discuss it further, so I'll try to implement all this stuff, and post my results next week.

→ Reply

4 years ago, # ^ |

+10

Cool. It's definitely worth doing more tests! The improved TrueSkill from St. Petersburg looks like a good alternative. Nonetheless, I want to reply to your latest points:

What exactly is your CLT-based argument? Are skills and performances the sum of very small contributions? Top players seem more spread-out than a normal distribution would predict, and all players seem to have more days that are very good or very bad, compared to normal draws. The logistic causes my model to put less weight on these unusual performances. We could try to measure the true distribution.
The problem with ties is that there is no fixed $$$\epsilon$$$ that works. The most obvious example are the massive ties at $$$0$$$ points, whereas performances drawn from logistic or normal distributions would have no such bias. Maybe that's ok in practice: without human-interpretable formulas, it's hard for me to guess, just as it was hard to foresee problems with the original TrueSkill model of ties.
In Elo-like systems, rating differences are based on win probabilities, not percentiles. As far as I can tell, TrueSkill doesn't enforce percentiles either: it merely suggests them in the prior, just as Elo-R does.
Fixing the average rating reduces our control over the inflationary and deflationary forces I mentioned. But now I have a question. It seems to me that fixing the average rating is inconsistent with non-constant $$$\sigma$$$, because users with large $$$\sigma$$$ would gain/lose more points than their opponent loses/gains. Are you saying that TrueSkill manage to have both, and how?

→ Reply

4 years ago, # ^ |

Yes, something like that. A least on top level. Your skill is like a sum of problems you know how to solve and your performance is roughly the sum of solved problems minus the sum of mistakes you made. I think that good and bad days happen because of the way we competing. You have almost no choice between problems, so it is pure luck to get problems you would like. In order to handle this effect you should use some recommender system as a rating system, which is definitely an overkill for our case. But anyway, I think it is not good idea to change something in rating system just in order to handle additional CF-specific noise.
If you want to be very precise, then you can choose $$$\varepsilon$$$ as a function of score, but I think that for practical purposes $$$\varepsilon = 0.736$$$ should be fine (and maybe the case of $$$0$$$ score should be handled separately)
True, I meant percentiles in the Bayesian sense, not where you are in the actual rating. For all Elo-based rating systems you could compute your expected percentile based on your rating (like, if your rating is 1500, then you are in top $$$\frac12$$$). It could be different from the percentile in the actual rating, e.g. about 60% of ranked lichess (uses Glicko-2) users have their blitz rating lower than 1500.
Forget about this part, I messed up. Mean rating should be constant if we compute ratings with proper weights, I believe that it is also true for your model.

→ Reply

farmersrice

4 years ago, # ^ |

+25

We need to make this the new rating system

→ Reply

4 years ago, # ^ |

+13

I spent some time learning Rust and implementing "TrueSkill from SPb" and now I have a couple of questions about your rating system.

I spent some time reading your code and noticed that you have only strict formulas, no approximations or loops until convergence. It makes me think what you do is not a Bayesian inference rather something different. I think that because the posterior distribution should not be expressible in terms of logistic distributions and therefore should be approximated somehow (just as TrueSkill does so). So, if I am right, then why does your rating system converges fast enough? And if I am not right, how did you do this?

Also, if I understood it correctly, you don't add additional noise to players performance as it usually done. Instead, you have a lower bound on players sigma. Why are these approaches equivalent?

Meanwhile, I found some bugs in pseudocode from article, so the implementation would take more time (algorithm is still correct, I just need to get some formulas from scratch). I think that I will finish and post first results next week. Also I want to use your CF-parser if you don't mind.

→ Reply

4 years ago, # ^ |

Sure, use whatever you need!

I derive the formulas in the paper: in short, I cheated in a few places to make the formulas tractable. As a result, I have human-interpretable formulas that approximately follow the Bayesian theory. In other words, my system can be interpreted in two ways: the Bayesian way, or as an exponentially weighted average of recent performances with reduced weight on outliers.

I do add noise. The "lower bound" you're seeing is simply the limiting value of $$$\sigma$$$ if the noise and measurement error per contest are constant. Actually, the noising calculation is where I had the toughest time sticking to the Bayesian derivation. In the end, I came up with four methods, each with different pros and cons which I'd be happy to discuss further.

→ Reply

4 years ago, # ^ |

What is the complexity of your code? My code processes all contests from the set in your repo less than in two minutes (if cached already) and I haven't optimized it yet. TBH, it gives some shit instead of results, but I know where is the problem and I think that it doesn't affect the performance.

→ Reply

4 years ago, # ^ |

← Rev. 2 →

Wow 2 minutes! I'd like to see. If each contestant recalls $$$M$$$ rounds of history, then a round with $$$N$$$ contestants should take $$$O(N^2\log\frac{1}{\epsilon} + NM\log^2\frac{1}{\epsilon})$$$ using binary searches, with the log factors reduced if Newton's method is used instead.

The $$$NM$$$ term only applies to certain versions of my "noising step", but the $$$N^2$$$ term seems common to all the programming contest algorithms that compute pairwise win probabilities.

→ Reply

4 years ago, # ^ |

Well, I can't prove that, but it seems like my code processes a round with $$$N$$$ contestants in time $$$\mathcal{O}(N \log \frac{N}{\varepsilon})$$$. It is not a really fair bound since each multiway tie and each team now work in quadratic time, but it is not so important when running on CF history and it definitely could be optimized to linear time.

I don't want to post code that does not work, but if you don't want to wait until I fix everything and make a post, you can find it on my github (username is the same).

→ Reply

4 years ago, # ^ |

Cool, it looks like TrueSkill doesn't need to compute the $$$N^2$$$ pairwise win probabilities! You can probably make it even faster using Rayon to parallelize as I did.

Today I reverted my code to use my simplest noising method. While the fancier methods are interesting for a variety of reasons, the simple one is more straightforwardly comparable with the current system on Codeforces.

Basically, it approximates the posterior, which is the most intractable part, by a Gaussian, while still using logistic distributions in other places. This method has no $$$NM$$$ term nor a saved history, so it satisfies the same monotonicity properties as Codeforces. It also results in the simplest versions of my formulas (see page 8 of the paper as of today). The rating distribution might converge more slowly with this method, but it seems to end up in the same place.

→ Reply

4 years ago, # ^ |

TrueSkill computes win probabilities only for pairs of players on neighbouring places (up to the multiway tie case).

Btw, I finished the algorithm and now it works in less than a minute, but somehow strange. I can't find the problem (it is either parameters or bugs), so I am curious how you tested your rating system.

Also, I am refactoring my code and since I am fairly new to Rust, I'd be very grateful for any suggestions to reduce the boilerplate and improve readability.

→ Reply