lichess.org
Donate

Ratings Are Broken

people should define inflation precisely.. all i hear is people feeling cheated somehow.
its not about ELO, people just started to smurf more and more.
@peppie23 said in #72:
> You have a wrong idea about what is going on exactly. I'll try to explain.

No, your explanation is wrong. Perhaps because you haven't read my explanation properly. The number of players raising or lowering the rating is not important in itself and should not be taken into account in any discussions of the rating. I don't think that FIDE did any mathematical research at all, accepting their innovations. There are officials, politicians, businessmen, even a few chess players - but mathematicians? This organization is not interested in the fairness of rating calculations, but in commercial benefits. And of course, the chess world has absolutely no need to protect this alien commercial benefit.
In any case, the problem was initially created by the actions of FIDE itself: the thoughtless expansion of the rating range from 400-600 points to 1400-1600. Of course, no tricks with the K-factor will solve the problem until the formulas are corrected to match the tripled rating range, while the rating itself is calculated incorrectly, not like all previous years when it was earned by players. Everyone who already had a more or less decent rating before the innovations can no longer increase it by playing to their strength - not because they suddenly weakened, but because their rivals now have low ratings with the usual strength of the game. But to lower it is easy.

In general, almost all FIDE decisions turn out to be erroneous or harmful, and in combination their harm is multiplied. For example, it was a mistake to set a K-factor of 20 when introducing ratings for blitz and rapid: people play a lot more games in blitz and rapid, so it would be correct to reduce the K-factor compared to long controls - for example, to 5 or 10. However, the combination with the expansion of the rating range became truly fatal - people could lose hundreds of points in one unsuccessful tournament. Of course, this does not mean that people suddenly began to play the category worse.

Regarding the K-factor, it is only a corrector of rating changes introduced to speed up the rating set by new, rapidly progressing players. It does not follow from anything that other players, with long-established ratings, should suffer because of this.
Firstly, where did this milestone come from for a high K-factor, 18 years old? In chess, age is measured by the number of games played, that's just what they need to be taken into account.
Secondly, another milestone for K=40 is the rating of 2300. In the old days, this rating could be described as: "international level, but the master level is still far away." Now, when 1000 points have been assigned from the bottom of the rating range, this rating can already be considered quite high, and few people will be able to achieve it from those who use K=40. Therefore, the rating of the others will hang up and down by a hundred points for a long time after each tournament.
Finally, the very fact of using K=40 indicates that the chess federation is not sure about the current rating of the player, this rating is kind of preliminary. If so, perhaps for an opponent with a stable rating, this game should be calculated with a lowered K?

But, I repeat, the first step is to correct the rating formulas themselves so that they correspond to a wide rating range from 1000 to 2600 and above. Without this, all attempts and ideas will be useless.
Let's wait a few years and let the AI's create a new rating system and new rules about winning points for chess games.
I get the sense that to fix the population distribution assumption, various sub-population characteristics are introduced (have been) as they become visibly problematic at the end of the computation pipeline, from rated players experiencing those.

That this is a hand-crafted patching solution, necessary because the simplistic population distribution primordial assumption is forcing all sub-populations quirks to be propagated all over (i could not prove that, here is some level of intuition, however I think one could make simulations and study how that goes. Perhaps the paper at the source of the blog has done that.

Am I wrong, about this population distribution assumption of a small finite number of parameter familiy? (I think not, i just read a blurb about glicko, from its author which tangentially confirmed that (note to self: find the statement if needed, linked from lichess FAQ about ratings). Something about what goes around comes around (and it does not have to). Forcing a shape for the population distribution would mean that, I think.

So, it might be possible that various fire extingshing with delicate such sub-pop charactistics ad-hoc parameters careful adjustments could correct that sub-population influx in the pool and its propagation to other members of the maximum pool, via the effect already suggested that the cooker cover pressure ensures. But how sensitive to such parameter testing is such solution.. Would it not be more robust to let it loose, and ensure that what propagates is what we would want, for big enough pool and enough games sampling pairs for long enough duration.. lots of enough.. i think some people might not realize the asymptotic nature of rating systems too, so those enoughs are for them.
@A_Kireev said in #83:
but mathematicians? This organization is not interested in the fairness of rating calculations, but in commercial benefits. And of course, the chess world has absolutely no need to protect this alien commercial benefit.

Jeff Sonas, Ken Regan are mathematicians with a doctor degree working for fide. So you are wrong that fide didn't do any math.
You need money to get an organization running especially an international one like fide. You can't rely solely on volonteers.

I agree there are serious internal problems but you are too harsh and pessimistic.
@A_Kireev said in #83:
> Firstly, where did this milestone come from for a high K-factor, 18 years old? In chess, age is measured by the number of games played, that's just what they need to be taken into account.

The correlation between improvement and age is stronger than improvement and number of games played (see. e.g. my blogarticle of 2017: http://schaken-brabo.blogspot.com/2017/02/ervaring.html) .
So you have a point that number of games should be taken into account but fide preferred to stick to only age as they preferred to keep the formulas simple. That is also a valid point although one could say that a computer doesn't mind complexity.
@A_Kireev said in #83:
> If so, perhaps for an opponent with a stable rating, this game should be calculated with a lowered K?

This is already happening. Myself has K = 20. Above 2400 it is even K = 10.

Anyway as I wrote already many times to many people. Building a robust fideratingsystem which is future-proof is a very difficult task. It is much easier to criticize than to come up with a solution. I made a proposal but even I am not sure what exactly the impact will be of my proposal. Simulations are needed and for that you need access to complete historical fideratinglists. I don't have such access. Only a small group in fide has so it is in their hands.
@peppie23 said in #87:
> Jeff Sonas, Ken Regan are mathematicians with a doctor degree working for FIDE.

Ken Regan, as far as I know, deals with computer chess, not ratings.
Jeff Sonas, as far as I know, offers some sensible solutions regarding ratings - but he does not make decisions. The decisions are made by the same officials, politicians and businessmen, and they think exclusively about profit, not the fairness of the rating system. And, of course, neither Sonas nor the officials are responsible for the failed results of rating reforms - no one in FIDE is responsible for them at all. What a misfortune, the system, which had been working for decades without failures, went into disarray. Money is flowing, that's the main thing.

> You need money to get an organization running especially an international one like FIDE. You can't rely solely on volonteers.

We know that FIDE worked for many years before the expansion of the rating range - and worked well. At least one of the main tasks was carried out without failures - the calculation of ratings according to a well-thought-out formula.

> The correlation between improvement and age is stronger than improvement and number of games played...

In the world, you can find a lot of correlations between a variety of events and signs. Such correlations are not always caused by a causal relationship. Moreover, not all of them should be included in the rating calculation formula.
If we had several separate ratings: for children, for the elderly, for women, for smokers, etc., then it would be another matter. But we have one rating, the same for everyone, built on the basis of tournament results. Adding extraneous parameters to the formula, such as age, distorts the rating. In addition, if you can add one extraneous parameter, then you can add another and a third - surely there will be some correlation for each of them.

> This is already happening. Myself has K = 20. Above 2400 it is even K = 10.

This is not the case. Your K=20 is your personal K assigned by rating less than 2400. No matter who you play with, with a beginner child 1100, a grandmaster 2600 or an approximately equal opponent, your K will remain the same. And in general, K 20 (or 15, as it was before) can be recognized as more or less standard for most players, because there are not so many holders of a rating of 2400 and above. But K = 40, and even assigned to anyone just by age, is overestimated, abnormal. Therefore, it would be logical for both you with K = 20 and the master with K=10 to count the games with owners of K= 40 with a reduced K (for example, 5). A stable rating obtained in the fight against the holders of the same stable rating should not decrease sharply due to the fact that the opponent's rating is provisional.

> Building a robust fideratingsystem which is future-proof is a very difficult task. It is much easier to criticize than to come up with a solution.

Being chess players, we sometimes make erroneous, meaningless moves. Having realized that we made a mistake, we can admit it and try to correct it - for example, by returning the figure back from a bad square to a good one. Yes, we wasted time, and the opponent managed to make two useful moves, but the game continues. Admitting a mistake is the first step to correcting it. If we do not admit the mistake and continue to put pieces on bad squares, following a deliberately erroneous plan, then most likely we will lose the game.
It has long been clear that the expansion of the rating range was carried out incorrectly, this is a mistake. It can be corrected in two ways: completely cancel the extension or recalculate formulas for an extended range. All attempts to somehow sit on two chairs, not to admit a mistake, but to do with some temporary crutches or one-time rating increases are doomed to failure.
@A_Kireev said in #89:
>We know that FIDE worked for many years before the expansion of the rating range - and worked well.

I remember more than a decade ago that Fide was virtually bankrupt. I don't know the internal kitchen of fide so what exactly the reasons are for this virtual bankruptcy but I do understand this forced fide to look for new revenues.

I agree new revenues shouldn't be the driver for lowering the minimum ratings but it is not easy to find other solutions either.