Extremely negative average rating change (Round 512 Div 1)

#	User	Rating
1	ecnerwala	3648
2	Benq	3580
3	orzdevinwang	3570
4	cnnfls_csy	3569
5	Geothermal	3568
6	tourist	3565
7	maroonrk	3530
8	Radewoosh	3520
9	Um_nik	3481
10	jiangly	3467

#	User	Contrib.
1	maomao90	174
2	adamant	164
2	awoo	164
4	TheScrasse	160
5	nor	159
6	maroonrk	156
7	SecondThread	150
8	-is-this-fft-	148
9	pajenegod	146
10	BledDest	144

After Round 512 finished, I took a look at the rating changes from Div 1 and ended up calculating their average.

To my surprise, the sum of the rating changes was -5383 (anyone want to help double check the math?). This means that for the 500 participants the average rating change was -10.77. I know this contest is probably an outlier, but this seems way too extreme to be reasonable.

If anything, I think the average rating change ought to be slightly positive, in order to reward participation over time. For example if the average rating change per contest is +0.5, then if someone participates in 100 contests over two years (which is some serious dedication), the most this could contribute to their rating is +50, which seems perfectly fine. This also serves as encouragement for relatively inactive people to be more active.

However, averaging more than a 10-point loss in a round is unreasonable and likely to discourage people over time from participating if it keeps happening; i.e., people who maintain a similar level of performance will see their rating go down over time, and people who improve slightly will see their rating stay flat. If my calculations all check out, the rating algorithm likely deserves some reconsideration.

EDIT: After some more observation, it looks like what's happening is that new accounts who do just a few contests, lose rating to everyone else, and go inactive are potentially offsetting this effect overall. It's hard to say for sure without more specific data.

Contest

Average rating change

Technocup 2019 — Elimination Round 1

-10.898563

Codeforces Round #512 (Div. 1, based on Technocup 2019 Elimination Round 1)

-10.766000

Codeforces Round #512 (Div. 2, based on Technocup 2019 Elimination Round 1)

-10.244781

Codeforces Round #511 (Div. 1)

-0.241245

Codeforces Round #511 (Div. 2)

-10.179537

Educational Codeforces Round 51 (Rated for Div. 2)

-10.830094

Codeforces Round #510 (Div. 2)

-10.601980

Codeforces Round #509 (Div. 2)

-10.651798

Educational Codeforces Round 50 (Rated for Div. 2)

-10.796802

Codeforces Round #508 (Div. 2)

-2.329035

Codeforces Round #507 (Div. 1, based on Olympiad of Metropolises)

-10.592391

Codeforces Round #507 (Div. 2, based on Olympiad of Metropolises)

-10.624943

Manthan, Codefest 18 (rated, Div. 1 + Div. 2)

-10.740711

AIM Tech Round 5 (rated, Div. 1 + Div. 2)

-10.584122

Codeforces Round #506 (Div. 3)

-10.437718

Codeforces Round #505 (rated, Div. 1 + Div. 2, based on VK Cup 2018 Final)

-7.869689

Educational Codeforces Round 49 (Rated for Div. 2)

-10.859310

Codeforces Round #504 (rated, Div. 1 + Div. 2, based on VK Cup 2018 Final)

-10.103793

VK Cup 2018 — Final

-9.256410

Codeforces Round #503 (by SIS, Div. 1)

-2.068966

Codeforces Round #503 (by SIS, Div. 2)

-10.951158

Educational Codeforces Round 48 (Rated for Div. 2)

-10.605917

Codeforces Round #501 (Div. 3)

-10.651042

Codeforces Round #500 (Div. 1) [based on EJOI]

-10.828283

Codeforces Round #500 (Div. 2) [based on EJOI]

-10.244419

Codeforces Round #499 (Div. 1)

-0.572402

Comments (28)

Write comment?

KhaledKEE

6 years ago, # |

← Rev. 2 →

+39

Most contests have negative average and I was bothered by this too. Because this means the system is losing some of its total points every contest. So the only way to keep the total points increasing is to create new account which adds 1500 points but never increase its rating. This means that for each participant with +ve rating change there are multiple accounts that lost more than this participant gain. I think this has something to do with the inflation fighting strategy (which is that the sum of the rating changes of the top $\text{[math]}$ participants is not larger than 0).

→ Reply

OneStone

6 years ago, # ^ |

+25

I think the inflation fighting strategy is the reason for the bias because, on average, the top x participants have performed better than they usually do and therefore gain rating. Thus, the effect of this is to make participants that placed worse than expected to lose even more points than they should. A fairer approach would be to make the sum of the rating changes of every contestant (not just the top x) to be 0.

I think MikeMirzayanov should look into this.

bmerry

+43

While it's probably a minor effect, don't forget that churn between divisions will introduce some bias: those losing points and going down to division 2 will be included in your average, while those gaining points and moving up to division 1 won't.

neal

That’s true. Another factor that is harder to measure is the effect of new users who do one or two contests and then go inactive.

+31

This effect doesn't matter very much because the sum of the rating changes in any contest should be zero regardless of who's participating. Besides, division 2 winners don't take points from division 1 losers, but rather they take them away from other division 2 losers.

In any case, I think that we would see a negative value if we computed average rating change for division 2.

I_love_Tanya_Romanova

+33

I'm curious if anyone has looked into it. I never explored that system in detail, but from what I can see and read here, it seems to work like this:

CF has a huge problem of people adding more and more rating into the system by repeatedly creating new accounts after performing poorly on the old one, or simply abandoning their accounts and CP after first several bad contests; there are some weird ways to try fighting rating inflation, like having formulas which result into significant negative sum of rating changes after the contest — to take that rating back out of the system; it doesn't work well as we still have rating inflation issue.

Is this impression reasonable, or am I completely missing something?

In case it is — maybe it would be better to try resolving other part of the issue? Especially doing something to decrease number of people who violate rules by creating multiple accounts? Though that bad initial user experience of "My friends have higher rating than me because they do less contests, and I just keep losing more and more rating" isn't great either.

+30

I think that the rating inflation issue comes from the fact that you can get a MASSIVE amount of rating points by doing very well (something like top 30) in a single contest which wasn't possible under the old system. However, I don't have an issue with this because it's reasonable to reward participants who did very well.

About people abandoning their accounts after a bad performance... well the solution to that is a system like AtCoder, where you start at 0 rating. It solves this problem because if you abandon your account you have to start all over, and it would also solve the bad initial user experience because hey, you will always gain rating on your first contest if you start at 0.

In any case, rating inflation really only affects those at the top of the leaderboard, so subtracting points from everyone equally is not the greatest solution. What bothers me the most about this is that, if you get exactly your expected rank, then you should lose 0 points, but what actually happens is that you lose 20 points.

farmersrice

-6

Starting at 0 rating just punishes people who don't have time to do a lot of contests

majk

I always considered the function f in the AtCoder rating system to be the size of a particular confidence interval. The rating system can, based on the measurements of your performance, conclude that your actual skill is in the interval (APerf - f(n), APerf + f(n)) with some high probability. For various reasons (incl. stability), only the lower bound of the interval is reported.

If anything, AtCoder punishes people that compete and improve. I would have higher rating if I abandoned my account after half of my contests. Still seems better than the high volatility of CF rating, though.

Xellos

How about a slow rating creep towards 1500 for inactive accounts? Yes, even those with very low rating — if someone tries a contest once, fails and tries again in 3 years, it's fine to let them start from 1500 again. The rating gained-lost this way could form the net rating loss in a contest (or it could be done over the course of several contests etc.).

+63

Perhaps we should stop introducing arbitrary hacks to fix the effects of our previous arbitrary hacks.

I thought all rating systems were just throwing shit on a wall and seeing what sticks...

+34

I believe someone calculated that average rating loss in div1 was ~10 and average rating loss in div2 was ~6 on a number of contests in the recent past (six months to a year). That's a lot of deflation.

Still looking for the source.

kostka

+62

I made a small script to show averate rating loss/gain from the last few contests:

Contest	Average rating change
Technocup 2019 — Elimination Round 1	-10.898563
Codeforces Round #512 (Div. 1, based on Technocup 2019 Elimination Round 1)	-10.766000
Codeforces Round #512 (Div. 2, based on Technocup 2019 Elimination Round 1)	-10.244781
Codeforces Round #511 (Div. 1)	-0.241245
Codeforces Round #511 (Div. 2)	-10.179537
Educational Codeforces Round 51 (Rated for Div. 2)	-10.830094
Codeforces Round #510 (Div. 2)	-10.601980
Codeforces Round #509 (Div. 2)	-10.651798
Educational Codeforces Round 50 (Rated for Div. 2)	-10.796802
Codeforces Round #508 (Div. 2)	-2.329035
Codeforces Round #507 (Div. 1, based on Olympiad of Metropolises)	-10.592391
Codeforces Round #507 (Div. 2, based on Olympiad of Metropolises)	-10.624943
Manthan, Codefest 18 (rated, Div. 1 + Div. 2)	-10.740711
AIM Tech Round 5 (rated, Div. 1 + Div. 2)	-10.584122
Codeforces Round #506 (Div. 3)	-10.437718
Codeforces Round #505 (rated, Div. 1 + Div. 2, based on VK Cup 2018 Final)	-7.869689
Educational Codeforces Round 49 (Rated for Div. 2)	-10.859310
Codeforces Round #504 (rated, Div. 1 + Div. 2, based on VK Cup 2018 Final)	-10.103793
VK Cup 2018 — Final	-9.256410
Codeforces Round #503 (by SIS, Div. 1)	-2.068966
Codeforces Round #503 (by SIS, Div. 2)	-10.951158
Educational Codeforces Round 48 (Rated for Div. 2)	-10.605917
Codeforces Round #501 (Div. 3)	-10.651042
Codeforces Round #500 (Div. 1) [based on EJOI]	-10.828283
Codeforces Round #500 (Div. 2) [based on EJOI]	-10.244419
Codeforces Round #499 (Div. 1)	-0.572402

+19

Nice work!

Wow, it turns out -10 is not an outlier at all.

Swistakk

-10

Interesting. It clearly contradicts what I wrote below. So apparently we observe rating deflation at macro scale, but rating inflation at top places.

Dixtosa

Rich getting richer poor getting poorer yeah capitalism!

teja349

+13

Is it only me who is feeling I was denied close to 500( 73 contests*10*0.8) otherwise I am nutella:(

I rather see rating inflation all over the place and I was convinced it is pretty obvious to all people. When nutella color was introduced (~2 years ago) 4 people had it and cutoff was 2900. Now 36 people have >=2900. Maybe at minor scale we can find some weirdly looking data, but overall tendency can't be denied. (Or maybe ... does that apply to top places only?)

I also have an impression that the inflation is still there (at least at ratings >2000..2100). Stats in comment above, describing last 25+ contests, clearly show that there is a negative change of rating sum after pretty much every contest.

These two statements don't necessary contradict — because we are not talking about set of participants being constant over time.

I described one possible issue in my other comment — I suspect that distribution of inactive accounts may give us a hint about where the rating comes from. Does anyone have any stats on it? Some numbers to estimate how much "free rating" we got over last 1 year from accounts which got abandoned...

It is also possible that when we are talking about users with ratings 2900+, or even 2100+ — it doesn't really describe overall picture. These contestants are only a subset of all users, and I wouldn't even be very surprised if all our hacked formulas work somewhat differently for different parts of the range.

bazsi700

+11

I doubt that there could be a rating inflation for >2100, because div 1 contests also have -10 average rating change. The 1900-2100 people can't eat that much negative.

saketh

It is possible if rating is being transferred from Div 2 to Div 1.

WolfBlue

5 years ago, # |

Well, the average rating is about 1450 according to https://codeforces.com/blog/entry/52470, but you start at 1500. To say rating goes down over time just because average decreases each contest seems wrong.

For rating to be stable, the average account must quit after 5 games. It seems that rating is inflation, so, probably, the average person does not stay more than 5 contests. I don't know if anyone has ever calculated that number, though.

5 years ago, # ^ |

Yes, after observing for a while longer I think the effect of many people who do just a few contests, lose rating, and go inactive (something I mentioned above) is actually quite large and offsets the deflationary effect of the contests. I'll add an edit to the post to note this.

+29

I noticed this post was on the sidebar again. Annoyingly nobody had ever released their code to obtain these data (I just accepted them as true).

So I slapped together a really hacky method if you want to do it yourself, or to find your own data for other contests:

Compile this: https://pastebin.com/mUtbz2CU (don't judge the code — it's just a quick hack)
Go to https://codeforces.com/api/contest.ratingChanges?contestId=1074 where you replace the last digits of "contestId" with the id of the contest you want
Paste the json string on the cf page as input to the program
Get results

In this case I used this on the latest contest (Lyft final round mirror, div1) and discovered that average rating change for purples is -18.5. I also discovered that for div2-only rounds purples gain an average from anywhere between +8 to +15 points.

Thanks for the data. This helps provide some evidence to the theory that new accounts generally donate rating to Div 2, and then some of that rating gain balances out in Div 1 via competitors newly promoted to Div 1.

I decided that one data point wasn't enough, and I took a good look at about 3 other contests. Turns out the average rating change for purples in those contests was slightly positive, from +0.8 to +3. Maybe the idea that div2 gains trickle down to div1 isn't entirely solid.

That's interesting; I didn't expect the number to vary so much from contest to contest. Sounds like we'd need a lot more data to draw good conclusions.

neal's blog