EbTech's blog

By EbTech, history, 3 years ago, In English

It's long been known that certain rating systems, namely Glicko-2 and Topcoder, are not monotonic. In other words, there are cases where losing can eventually result in a higher rating. We wanted to know just how severe the issue can be. In joint work with inutard at WWW 2021, we computed how tourist's rating would evolve according to both Topcoder and our custom rating system. The dataset consists of Codeforces rounds up to Looksery Cup 2015, accessed via the Codeforces API. Here, we see that tourist's Topcoder rating is 3284, but could have been as high as 3807 if he were willing to lose on purpose!

Plot

More details on the adversarial strategy: for his first 45 rounds, we simulate tourist playing normally, following historical data. In the next 45 rounds, he purposely becomes last place whenever his Topcoder rating is above 2975, but plays normally otherwise. Then finally, he returns to playing normally for an additional 15 rounds.

A similar strategy recently broke the Pokemon Go Battle League rankings, which seem to be based on Glicko-2: https://www.reddit.com/r/TheSilphRoad/comments/hwff2d/farming_volatility_how_a_major_flaw_in_a/.

  • Vote: I like it
  • +280
  • Vote: I do not like it

| Write comment?
»
3 years ago, # |
  Vote: I like it +3 Vote: I do not like it

Does it work on lichess?

  • »
    »
    3 years ago, # ^ |
    Rev. 2   Vote: I like it 0 Vote: I do not like it

    If it uses Glicko-2 then I suspect the same exploit will work. The trick is to massively inflate your volatility by alternating between losing and winning.

  • »
    »
    3 years ago, # ^ |
    Rev. 2   Vote: I like it 0 Vote: I do not like it

    I don't think so. When you reach your_normal_rating + 100, your opponents will crush you and will not let you climb higher. And weak opponents (your_normal_rating — 300) will not accept your challenges.

    It's easier to get this your_normal_rating + 100 with fluctuations.

    • »
      »
      »
      3 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      I'll just try it myself.

      https://lichess.org/@/VolatilePlayer

      Most of players who played a lot of games have 45-46 deviation. I'll stay on 2100 rating until I reach 45 deviation, then I'll start playing full strength. Let's see if I will overcome my typical 2350-2400. I think not.

      • »
        »
        »
        »
        3 years ago, # ^ |
        Rev. 2   Vote: I like it 0 Vote: I do not like it

        I would be very curious too! Lichess uses Glicko-2: https://i.imgur.com/bOjm17e.png

        But given how many players are on the platform, it would be weird if they haven't hacked a fix for this attack.

        Please feel free to email / message us your results! We can include it in our repo and credit you :)

        • »
          »
          »
          »
          »
          3 years ago, # ^ |
            Vote: I like it 0 Vote: I do not like it

          Lichess has easy API and open source code, you can actually experiment with other players' results.

          Also, to get maximum profit, I should return to this account after some time (maybe 1 year), when the volatility greatly increases and I'll be getting +100 for my first games. But that's not the case of your study.

»
3 years ago, # |
  Vote: I like it +48 Vote: I do not like it

rainboy for topcoder 2021.

»
3 years ago, # |
  Vote: I like it -6 Vote: I do not like it

You theoretically broke Topcoder ratings. It's still a long way to actually see this happen.

»
3 years ago, # |
  Vote: I like it +1 Vote: I do not like it

This blog seems really informative. I just have a small question and hope you won't be offended. Why did you make an effort into this? I mean, was this some sort of academic research or just something you're passionate about? I ask because your work seems genuine and tough and I've never had a drive to do something similar.... Once again, I don't mean any offense.

  • »
    »
    3 years ago, # ^ |
      Vote: I like it +25 Vote: I do not like it

    It started as a fun project a few years ago, out of a curiosity to see whether good theoretical foundations would solve some of the issues with programming contest rating systems. The more recent work was undertaken to turn that project into an academic publication. Hope this helps!

  • »
    »
    3 years ago, # ^ |
      Vote: I like it +6 Vote: I do not like it

    Well, it is kind of a good measure of reliability of rating system.

»
7 months ago, # |
  Vote: I like it -14 Vote: I do not like it

I once suggested toph.co to display users rating as (rating — deviation) so inactive players don't stay at the top of the leaderboard after participating in a few contests. But now it looks like it is a better idea than I could think of.