lichess.org
Donate

Science of Chess: A g-factor for chess? A psychometric scale for playing ability

@sandals said in #30:
> Factor analysis is mostly fake. In this particular case, your 4 factors depend heavily on (a) the arbitrary choice of deciding to fit 4 factors (rather than 3 or 5), and (b) the rotation strategy (arbitrarily chosen to be "promax"). You would get vastly different factors if you changed one of these arbitrary parameters. The same thing happens regularly in psychometrics; factor analysis should be viewed with extreme skepticism.

Do you have alternative approaches that you think would be more appropriate? These aren't *my* factors, remember - I'm talking about what this research group did. Certainly happy to hear your ideas for how to analyze this dataset differently.

You're right that FA (and related techniques) require decisions about dimensionality and such, but there are also ways to check for robustness and try to make those decisions in principled ways, or at least with transparency. For my part, I think "mostly fake" is too strong, esp. if we don't differentiate between the technique itself and how researchers apply it.

Thanks for reading!
@NDpatzer said in #21:
> It would definitely be neat to take data from lichess to see how things like puzzle storm or other easy-to-administer tests do compared to the ACT.

Someone did some quick correlations comparing puzzle storm to blitz / bullet Glicko score:

www.reddit.com/r/chess/comments/w42m4r/how_does_lichess_puzzle_storm_relates_to/

The correlations were 76 to 78%, which puts them in the ballpark of the ACT correlations you mentioned. (A bit of apples and oranges though, since it's ACT and OTB Elo vs Lichess puzzle storm vs Lichess Glicko, but interesting that the numbers end up similar.)
@NDpatzer said in #18:
> That would be pretty cool. If you haven't already, do check out what else the authors have to say about their exploratory analysis. Not the same as what you're describing, but there are more details there than I wanted to get into in the article. I bet there's plenty of room for secondary analyses of the ACT data, but I'd leave that to folks who know those modeling techniques better than I do!
>
> I'll tell you a model comparison project I haven't seen yet and would love to try out, though: I did a very crude factor analysis on Lichess Bullet, Blitz, Rapid and Puzzle ratings, but would love to expand that to include variants to see what the structure of that data is. I keep meaning to read up on data scraping to get a big collection of ratings off the site but just haven't gotten my act together yet. If you're interested, I'd be very keen to see what happens.

Do you happen to know if the data they had was made public? I recently took an IRT class and am curious if I could do some IRT-specific things with the data.

As for the factor analysis, I am quite interested in the ratings factor analysis! Where can I find your work?
@tackyshrimp said in #33:
> Do you happen to know if the data they had was made public? I recently took an IRT class and am curious if I could do some IRT-specific things with the data.
>
> As for the factor analysis, I am quite interested in the ratings factor analysis! Where can I find your work?

I think you can find their data set here: search.r-project.org/CRAN/refmans/LNIRT/html/AmsterdamChess.html

As for the very cursory stuff I did, you can find a description of it here: lichess.org/@/NDpatzer/blog/science-of-chess-how-many-kinds-of-chess-ability-are-there/JAdbMm6V

It's probably more useful to point you towards the data I used for that analysis though, which you should be able to find here: www.chessratingcomparison.com/

Also, if you know how to use the lichess API (something I'm still working on getting better at!) you can probably put together a bigger dataset with whatever you like in it. Hope this is useful!
@NDpatzer , you say there are ways to check for robustness and make the parameter choices in a principled way. Maybe, but if so, psychometricians don't do it. I just check the paper the factors are from, and the only robustness checks they made failed: the model fits similarly well with 1 factor, 3 factors, or 4 factors, with no principled choice between them. The choice of rotation strategy was not justified at all.

This is the norm in psychometrics, and it's why I say factor analysis is fake. It is more entertainment than science; if you hired a different group to rerun the experiment without looking at the exact parameters used in this paper, you'd get completely different results.
@sandals said in #35:
> @NDpatzer , you say there are ways to check for robustness and make the parameter choices in a principled way. Maybe, but if so, psychometricians don't do it. I just check the paper the factors are from, and the only robustness checks they made failed: the model fits similarly well with 1 factor, 3 factors, or 4 factors, with no principled choice between them. The choice of rotation strategy was not justified at all.
>
> This is the norm in psychometrics, and it's why I say factor analysis is fake. It is more entertainment than science; if you hired a different group to rerun the experiment without looking at the exact parameters used in this paper, you'd get completely different results.

Ideally I would have liked to see the authors talk about how the loadings compare across different models, and the same thing applies to choices like the rotation strategy - how robust are the outcomes to different versions of the analysis? In general, I think it's more useful to be aware of limitations of a technique and best practices for applying it than to dismiss an analytical approach entirely.

It's also good to actually try the thing you suggested - get multiple groups to work with the same data and see how divergent the results are. That's been done in fMRI and EEG research as a way of trying to understand how sensitive conclusions are in cog. neuro studies to different analytic choices.

I'm still curious what you think would be a better way to try and characterize the data collected here. If you don't like factor analysis, are there other approaches you think would be better for drawing conclusions about how effectively the ACT serves as an estimate of playing strength?
so what is wrong with factor analysis? asking in general not read ACT stuff yet. I was hoping it was about chess board information of some kind being part of the raw data being analysis for factor analysis. Is factor analysis including things that in ML one might call feature selection or reduction? A simple example being PCA (linear decomposition based of variability from the raw data dimensions being analysis), but there are others. All those things about finding most "informative" according to some divulged objective of what that would mean. in case of PCA, it is saying that the blog of data used overall variation is best captures by however many dimensions were a priori assumed in the PCA model to optmize the objective.

There is no BS or validity from above, or supremacy of ELO over anything. Is it the specific methods of the blogs (I should get to read) that are being objected here, or some misinterpretation of ELO as the only real thing (from synaptic pathway habituation, being part of teh scenery for so long, with thirsty minds wanting their social image ranking so deeply, they are ready to sell their soul to some 1 dimensional value, while we all know that chess is not about how far one can throw a ball).

Sorry, I can't help my style. But it is very easy to get used to something from being a convened better than nothing measure, and from repreated usage, give it more reality attribute that it has. Or thinking the universe does not need or have more complexity that this projection. as long as it satisfies the primary drive of ranking on some ladder. While it helps, in chess, to project oneself into the objects we are toying with, as it hightens attention levels, it does not give reality to that thinking tool, otherwise.

So in 3r person view of chess, I think there is not way one can reject a factor analysis, unless itself has circular hidden reasoning.

This is why I think even if a tool seem to be working wel enough, we should not be lazy and not ask what the premises of the tools were. The goggles.. the shoved under the carpet information or not.

So, this is liek a theorem.. condition to consequence. It has its validty if true (the association). but it is not about arguing if A is true or not.

So for me, what is key is that with showing some result B, one always give the condition A (and that is really really really , not apparently part of chess culture.... (maybe only in own mind when alone with the board, but at communication level, people don't seem curious about that level of understanding). I see SF oracle usage.
Expert advice pure credence taking over ability to reason or asking for a complete reasoning instead of per example imitation proposal.. So. maybe we are not used to take both A and B as tools..

There are exceptions about what I just said. and exceptions they are. so what I said seems not to be the exception.
@NDpatzer, you could do a regression of Elo with respect to ACT scores, for example. (They did do that, as you wrote in your post, except it seems like they regressed on the factors instead of on the subtests? Why?)

It is also interesting to see how the ACT components correlate to each other, so you could look at the correlation matrix (of ACT tests with each other) and highlight some of its larger and smaller regions.

The only part I find sketchy is when the correlation matrix is approximated with a low-rank matrix, or with a sum of a low-rank matrix and a diagonal matrix (i.e. factor analysis) -- there are usually multiple ways to do this that give different results. I don't mind looking at the correlation matrix WITHOUT doing the low-rank approximation, and I don't mind doing regressions.

If you must do low-rank approximations, I am OK with PCA instead of factor analysis, since that removes all of the experimenter's choices (you don't get to choose the number of factors, you don't get to choose the rotation, etc.). You could tell us how much of the variance is explained by each component of PCA and what tests load on. But again, I think this is usually less informative than just staring at the correlation matrix and doing regressions of the subtests with some ground-truth measure (Elo, in this case).
@NDpatzer said in #21:
> The authors definitely end up saying that the choose-a-move and predict-a-move subscales are doing most of the heavy lifting. Those other tasks do end up covering some unique variance in their model fits (which is good) but it's not as much. Again, always a challenge - how many tasks (and which ones) do you add to try and account for as much variability as you can?

Overfitting the model seems very likely to me if most of the variance is already captured by tactics & calculation ability. I guess the idea is that you compare the predictive ability of both a puzzle-only and a multi-task model and see which performs better?

On another note, I think a task involving a blindfold king & pawn endgame tactics puzzle would be an excellent predictor, but you'd need something that is doable for decent players (like you or I), and not just the 99th percentile
@sandals said in #38:
> @NDpatzer, you could do a regression of Elo with respect to ACT scores, for example. (They did do that, as you wrote in your post, except it seems like they regressed on the factors instead of on the subtests? Why?)
>
> It is also interesting to see how the ACT components correlate to each other, so you could look at the correlation matrix (of ACT tests with each other) and highlight some of its larger and smaller regions.
>
> The only part I find sketchy is when the correlation matrix is approximated with a low-rank matrix, or with a sum of a low-rank matrix and a diagonal matrix (i.e. factor analysis) -- there are usually multiple ways to do this that give different results. I don't mind looking at the correlation matrix WITHOUT doing the low-rank approximation, and I don't mind doing regressions.
>
> If you must do low-rank approximations, I am OK with PCA instead of factor analysis, since that removes all of the experimenter's choices (you don't get to choose the number of factors, you don't get to choose the rotation, etc.). You could tell us how much of the variance is explained by each component of PCA and what tests load on. But again, I think this is usually less informative than just staring at the correlation matrix and doing regressions of the subtests with some ground-truth measure (Elo, in this case).

There we go - I also usually prefer PCA as a first-pass strategy because I find it easier to think about dimensionality reduction rather than recovering latent factors. It still requires you to make some decisions when it's time to interpret your result - at least in my discipline, there *is* often a question of how many factors you choose to retain, for example, or how to interpret loadings on the low-order PCs. And sure, if you prefer to look at a correlation matrix, I guess I can see the merit. I mentioned this up above, but if you want to go look at the correlation matrix yourself or run PCA on the data at your preferred granularity, you can go get the ACT data and play with it. Thanks for taking time to articulate these ideas.