Jarekczek's blog

By Jarekczek, history, 11 months ago, In English

I was asked to prepare more stats similar to these charts.

So I present numbers of active users by colors, both absolute and relative, for lifetime of Codeforces.

Absolute users count

The dotted line indicates total number of active users. Absolute users count

Relative users count

Relative users count

Relative cumulative users count

Relative cumulative users count

Assumptions

Active users means users who:

  • participated in a rated contest in 6 months preceding given date
  • participated in at least 1 rated contest

Measurements are taken every 3 months.

Definitions of colors are from today, for example green is 1200-1399.

Tools

Tooling is contained in this repo. It consists of:

  • java application downloading and caching contests' results, calculating data rows
  • python script generating the plots

Repo also contains the data produced by java application.

Evening update

Since system penalty indeed affects rating of new users, it would be reasonable to make another probe. This is absolute count as before, but now 6 contests are required to consider a user active.

Full text and comments »

 
 
 
 
  • Vote: I like it
  • +75
  • Vote: I do not like it

By Jarekczek, history, 11 months ago, In English

I came here after four years break. No matter for reasons, but I decided to participate to see how I can perform today. You know, years go by, covid eats our brains etc. From rating 1741 in 2018 now I dropped to 1597, performing on level of 1100. This was sad to me, but I was prepared too see I declined. However I was surprised how much I dropped. Started investigating a bit.

First I was trying to look at other players. I found one that is really interesting. Actually he's a hero for me, because Codeforces Round #777 (Div. 2) is his 777-th round! Best regards ruban :) If you take a look at his rating chart you'll notice a drop in rating somewhere in 2020. At first sight seems to be a 150 points drop.

There are some mysteries behind Elo rating system. I am not an expert, maybe some of you know it better. But there are ideas that huge growth in players caused by covid may cause rating deflation, like this discussion on a chess forum. (Basically deflation means that rating decreases while skill stays the same.) At Codeforces we also experienced huge income of players in 2019 and 2020. Just taking random contests from 2019 and 2021:

(By the way, such growth is not visible in problem C.)

It's important that I look at cyan segment. Things may look differently in other segments, especially in red group.

So I am writing this to share my thoughts and possibly make us all conscious of the fact. But also to ask:

Can you say more about rating deflation on Codeforces? Is it possible to calculate what number today resembles given number in 2018? Of course I'm interested in a number that reflects actual skills, although I know Elo is in general relative.

Update 2022-03-16: Here you'll find a tampermonkey script that shows who on the rating page has the greatest contests count (on F12 js console). With this you can find users close to you and watch their graphs.

Full text and comments »

 
 
 
 
  • Vote: I like it
  • +28
  • Vote: I do not like it

By Jarekczek, 10 years ago, In English

I see not only me often wondering What was the cause of the minuses? I think there are several potential reasons, like:

  1. wrong / false
  2. unkind
  3. spam
  4. language quality
  5. not agree
  6. other
  7. upd: not funny
  8. upd: boring

I assume the reason to have downvotes and upvotes is to increase the quality of content. I think it's awesome. I learn a lot from it. But to allow users (including myself) to improve their posts, it would be nice to have statistics at downvoted blogs and comments, showing the downvote reason.

Having said that, I consider this not just another feature request, but something that may substantially increase user content quality and in the same time help users to improve and develop. Isn't that what's CodeForces for?

Full text and comments »

 
 
 
 
  • Vote: I like it
  • +51
  • Vote: I do not like it

By Jarekczek, 10 years ago, In English

I'll try to mimic the Codeforces editorial style for a TopCoder problem, BoardPainting.

We can easily paint each black square separately, that's allowed although not optimal. So we want to join some black squares, the adjacent ones.

Full text and comments »

 
 
 
 
  • Vote: I like it
  • +18
  • Vote: I do not like it

By Jarekczek, 10 years ago, In English

I thought it would be interesting to optimize my solution trying to beat the record and have it run the fastest. Unfortunately the machine tester exhibits time resolution about 15 ms so it's not possible to optimize the solution to run in less time. Unless I would be able to reduce it to 0. For example among problem 293B submissions it's apparent that all the fastest solutions have the same running time: 15 ms. Other solutions show times close to k*15 ms.

If it would be possible to increase the time resolution, it might be interesting. As LukasP described in his guide, trying to reach the shortest execution time is a good exercise.

UPD: As different approaches are discussed here, it may be worthwhile to see what problems others experience with execution time measuring, like here in Bug Race (requires TopCoder login).

Full text and comments »

 
 
 
 
  • Vote: I like it
  • +4
  • Vote: I do not like it

By Jarekczek, 10 years ago, In English

I am currently struggling with understanding TopCoder rating system. It's pretty cool they published the exact formulas, but I really lack the background for it, the description. I'll try to gather here this information, which is missing there. It's the beginning, but I hope with time it will get more complete. Comments and improvement hints welcome!

TopCoder rating system is an example of Bayesian rating system. In general this probably means that potential of each coder is decribed by 2 values: the mean and the standard deviation (SD). When comparing 2 coders, not only their rating is considered, but also the volatility. Some Bayesian rating systems show rankings based not only by the mean value, but compute it as a function rank(mean, SD). TopCoder uses simiplified approach and ranks coders using only the first variable — mean value.

Factors explained

  1. performedAs — the hypothetical value of rating received as a result of a tournament. Unfortunately not presently in stats. But from the formulas it turns out to be easy to calculate: it's oldRating + 6 * ratingGain, assuming weight equal to 0.2. The exact weight is very easy to calculate, so one may compute their individual value. Full formula for performed as is
    oldRating + ratingGain * ((1 / weight) + 1)
  2. rating — mean of performedAs values.
  3. volatility — standard deviation of performedAs values. Do not mistake it with standard deviation of rating values. The greater the volatility, the less predictable coder. Less predictable coder will be considered closer to the average rating among all coders participating in a contest. Coder with low volatility will be expected to present the level exactly denoted by his rating. Don't think that big volatility makes rating changes faster. It just means that a coder's rating varied a lot during last several matches.
  4. weight (w) — after 20 competitions its value is equal to 0.25. We may assume it a constant value. Then w / (w + 1) is 0.2 which is reciprocal to 5. This reciprocal is an essential value as it denotes the number of tournaments contained in the history. If 1 / w is 5, that means that rating system adds the most recent tournament to previous mean values as if it contained 5 entries. In other words, all previous tournaments are compressed to pretend there were only 5 tournaments.
  5. ratingChange — after performedAs was calculated it's being added to the mean value of performedAs. If you try to convert the formula for the mean value after adding i + 1 element to a serie, which has a known mean value, you would receive exactly the same formula as this one. That means the current performedAs value is added to the mean, as 6th element to hypothetical 5 elements existing before.
  6. volatilityChange — similarly as rating change, this is computed as if 6th element was added to previous serie of 5 elements.

Explanatory forum threads and articles

Full text and comments »

 
 
 
 
  • Vote: I like it
  • +13
  • Vote: I do not like it