lichess.org
Donate

Ratings Are Broken

@qed said in #30:
> I'm so confused by this. When I used to play OTB in Canada, I was given a "provisional rating" until I completed 25

Typically is less than 25, But the games you play with provisional rating do affect other players rating and fact your is provisional does not affect their rating in most national system.

Where problem is that young players unlike adults can get better really quickly. And FIDE rating is initial for 10 games but must include wins. And some of those 10 games will be against other young player who barely got their 10 games.

The initial rating should handle this I agree but evidence shows otherwise.

@Yersinia_Pestis said in #5:
>Mr Sonas - as statistician - should not spawn recommandations about an ELO or alike system.

Given that estimating random variables value from noisy data in problem of statistics I do wonder who else thant statistician could do it? Well any with good command of applied maths will suffice but statistics is exactly the right background

Also if look at kaggle competition results table there are several reference implementation Glicko-2 placed 56th ja chessmetric by Dr. Sonas on place 39.

So he true connoisseur of the art
@petri999 said in #31:
> Typically is less than 25
When you look at how many fide-rated-games are played by amateurs then you will quickly find out that it is for many players a very low number. It means any rating based on few games (< 50 per year) especially for beginners is pretty meaningless.

Fide could wait to give ratings till players really have proven a certain level but then a lot of players will stay unrated for many years. That choice is much worse than to stick to the inaccurrate ratings.

It is not simple at all to make the right decisions here. This is clearly demonstrated by the many clueless comments here.
@petri999 said in #31:
> Given that estimating random variables value from noisy data in problem of statistics I do wonder who else thant statistician could do it? Well any with good command of applied maths will suffice but statistics is exactly the right background
>
> Also if look at kaggle competition results table there are several reference implementation Glicko-2 placed 56th ja chessmetric by Dr. Sonas on place 39.
>
> So he true connoisseur of the art
Jeff Sonas is most likely a strong statistician but he is not a chessplayer himself. He has no fiderating at all. He looks at the numbers purely from a mathematical background. That is not sufficient at all here. He very often miss the links with other domains which somebody playing inside the ratingsystem would much more likely detect.
I think we need therefore a combined profile of statistician and player. They exist and I hope they are willing to help here.
Reading this brings back many discussion about the various ladder systems in other games, often referred to as Elo hell. It means players get stuck at a rating where they don't belong, because the noise in their rating outweighs their occasional variance.

Does this paper prove that "Elo hell" is a real thing?

en.wikipedia.org/wiki/Elo_hell
nope this proves a systematic error on lower end of ratings. @peppie23 said in #33:

> Jeff Sonas is most likely a strong statistician but he is not a chessplayer himself.

This purely mathematical problem. Origins of these rating are in Bradley-Terry models made on purely abstract concept. Elo/glicko have been applied to football, tennis, several online games. There is nothing Chess specific in whole issue as long as it is zero sum game. In some weird way also applied to horse racing.

So to develop purely mathematical construction that has no real dependency to chess you would need to be chess player? Given that when top player comment rating system and propose something it is mostly... hilarious.

Don't think so
@petri999 said in #35:
> This purely mathematical problem.

No it is not. One little example is the impact of the 400 points rule. When people get less than 1 ratingpoint for a win then they start to refuse playing such games. Players know this but a statistician will never able to deduct that from his charts.

I can give more examples where knowledge of how some things internally work, do impact on the ratings which you can't see by just looking at the numbers (see e.g. my latest blogarticle).
@peppie23 said in #36:
> No it is not. One little example is the impact of the 400 points rule. When people get less than 1 ratingpoint for a win then they start to refuse playing such games. Players know this but a statistician will never able to deduct that from his char

Problems in psyche should be treated by punishments and obligations not maths. because if you do the rating system will cease to work.
@petri999 said in #37:
> Problems in psyche should be treated by punishments and obligations
This will never work. Introducing punishments and obligations will just push lots of people away. 99,999..% are amateurs so we are not financially depending on chess. Personally I select my tournaments very careful. This is not a problem but natural human behavior (see e.g. the hundred-peaks in lichess.org/stat/rating/distribution/blitz for the human impact on ratings which statisticians don't want/ like to see as they have forecasted one without those peaks.)

Anyway a pure statistician has been on the steeringwheel of the fide-ratingcommittee for the last decade and made a complete mess of it. Let us continue with that same statistician doesn't sound to me at all a clever plan but fide is fide so I wouldn't be surprised if they do anyway. People tend to forget very quickly.
@peppie23 said in #36:
> No it is not. One little example is the impact of the 400 points rule. When people get less than 1 ratingpoint for a win then they start to refuse playing such games. Players know this but a statistician will never able to deduct that from his charts.

Sonas speaks about exactly that in his paper. So apparently he deducted it (or more probably, he spoke with the chessplayers from the commission with whom he worked to do this)
@Toncompte said in #39:
> Sonas speaks about exactly that in his paper. So apparently he deducted it (or more probably, he spoke with the chessplayers from the commission with whom he worked to do this)
Sonas speaks about the impact on the deflation for the 400 points rule. He doesn't mention at all about the refusals in his document. Besides Sonas is the guy who pushed the most for the abolishment of the 400 points rule and now he says the exact opposite. He is als the person who said for more than a decade that we have a gigantic inflation and now he says we have a gigantic deflation. I can go on... The internet is full of old articles from Sonas. I've been reading them for decades with astonishment.

No I don't understand why we don't give somebody else a chance. Do you go back to a doctor having prescriped you before the wrong medicament because you think next time he must be right?

I have nothing personally against Sonas. I want the fide-ratings this time to be corrected properly and I don't trust the new proposal at all.