Elo Rating and Rasch Model

1733788800000

Update

Everything I attempted to expose here has already been said, and analysed in more depth by Pelánek (2016). This article only contains a short explanation of Elo’s formula details and proof that the way both system make predictions based on rating difference is the same when we refactor everything in therm of RBRAR_B-R_A (used by Elo) instead of RARBR_A-R_B (used by Rasch), those two terms respectively representing the ratings of AA, the player whose chances are being calculated, and BB, its opponent, or the task they are trying to complete.

Context

After brainstorming for a while the question of measuring vocabulary knowledge, I came up with the idea of giving words an Elo rating similar to the one used in chess, and base the results of the test on the difficulty rating of the items. After discussing the idea with the excellent Preben Vanberg, the spreadsheet guru explained to me that the users can share the same rating scale as the items, which significantly simplified my design. Little did we know that we just had reinvented the wheel. When telling my psycholinguistics lecturer about this brilliant idea, she told me about the Rasch model. I never heard of it before, and after a quick look to it, it seemed indeed really similar.

A Look at the Adjustment Mechanism in the Elo System

The Elo rating system can be best described as a self-fulfilling prophecy. Based on the rating of participants, the system gives a probability for the outcome of a match. If the prophecy is fulfilled, it means that the ratings are correct and nothing changes, otherwise, the system gradually adjusts these ratings for the next time, and keeps doing so until reaching a point of equilibrium, that is, until the prophecy is fulfilled.

This correction is made by constantly adding to the current rating the difference of actual outcome and the expected result. If the expected output was correct, nothing changes. If 1 is expected (a win in chess) and 1 occurs, then the rating does not change. If the rating difference underestimated the outcome, the rating of the winner grows, the one of the looser decrease. Here is another example when the ratings are equal. Often time, we want challenges where there is a 50% chance of success, so that the ratings keep evolving towards their “intrinsic value”. In chess, we select players with approximatively the same rating, so that their chances of winning are equal. If one player win, when the expected score was 50%-50%, then one player will increase their rating by 10.5=+0.51-0.5=+0.5 and the other by 00.5=0.50-0.5=-0.5.

Still in chess, this difference is multiplied by a factor called a KK value, to speed up the adjustment process and make it so that fewer matches have to be played to find out the stable rating of the player. In the case of the Rasch model, the idea is exactly the same, except that half of the participants are a task to accomplish, and those tasks are given the same rating as the test takers in a way that, if test takers find the solution for tasks that most player fail at, their rating will increase faster than if they find the right answer to simpler questions (that is, a task that most test takers succeed at). The only difference between the two systems so far is absence of a KK value in the Rasch model, but as we saw, it is not a necessity neither.

“So, where is the difference?” asken you. Behold, the end will surprise you!

Now that we know how the rating evolve based on the expected result, we need to figure out how this expected result is calculated…

Predicting the Outcome of a Match in Chess

Here is the formula to predict the chances a player AA has to win against a player BB:

Pr{XAB=1}=11+10RBRA400Pr\{X_{AB} = 1\} = \frac{1}{1+10^{\frac{R_B-R_A}{400}}}

OK, let’s break this down:

{XAB=1}\{X_{AB} = 1\} stands for “the chances for a victory of A over B”.

The 400400 is used to spread the ratings, making them more human-readable. Without this “spreading fraction”, the Elo rating of Magnus Carlsen would barely reach 7.2057.205. Multiplying all the ratings by 400400, simply allows us to avoid dealing with decimals. Adding this variable also has the advantage to make the Elo ratings compatible to other rating systems that were used before it in the chess world. This spreading is also the reason why we need a KK value when adjusting the ratings. Without this fraction, we don’t need a KK value any more, of maybe one between 00 and 11 in order to slow the progression of the ratings and limit their volatility. If RBRA400=0\frac{R_B-R_A}{400} = 0, then the chances of wining are of 11+100=11+1=12\frac{1}{1+10^0} = \frac{1}{1+1} = \frac{1}{2}. If less, the chances will be less than a half, or more if more than a half.

The 1010 stands for “in order to have ten time more chances to win, the value RBRA400\frac{R_B-R_A}{400} must equal 1”. Although, it is not exactly, ten, but rather eleven times more chances, which means a probability of victory of 11+101=11+0.1=11.1=0.9091\frac{1}{1+10-1} = \frac{1}{1+ 0.1} = \frac{1}{1.1}=0.9091 in the case where the rating of our player AA is 400 superior to that of player BB. We could just as well have used 99 instead of 1010, but it must have been deemed an unnecessary optimization.

The truth is that any value above 11 could do instead of 1010, and the fraction is also unnecessary. If we were to use the following formula, the results would still stand as a correct Elo rating variant:

Pr{XAB=1}=11+eRBRAPr\{X_{AB} = 1\}=\frac{1}{1+e^{R_B-R_A}}

Where ee is Euler’s value: 2.71828...2.71828... This means that for a game where the players’ rating difference is 11, the player with the strongest rating has 11+e1=0.731...\frac{1}{1+e^{-1}}=0.731... chances of wining.

The Rasch Model

As we saw, the readjustment process is the same for the Rasch Model, except for the unnecessary KK value. The only subtle differences lie in the way the predictions are made, and as we mentioned, that instead of a player against another one, we have an examinee trying to achieve a task. For convenience, we’ll call the examinee AA, and their associated rating RAR_A and the task BB, with an associated rating of RBR_B. We call {XAB=1}\{X_{AB} = 1\} the successful completion of the task BB:

Pr{XAB=1}=eRARB1+eRARBPr\{X_{AB} = 1\} = \frac{e^{R_A-R_B}}{1+e^{R_A-R_B}}

Looks, familiar, doesn’t it? Let’s refactor this eRARBe^{R_A-R_B} a little:

Part 1

eRARBe^{R_A-R_B} eRA÷eRBe^{R_A} \div e^{R_B} 1eRA×1eRB\frac{1}{e^{-R_A}} \times \frac{1}{e^{R_B}} 1eRA×eRB\frac{1}{e^{-R_A} \times e^{R_B}} 1eRBRA\frac{1}{e^{R_B-R_A}}

Now, we can rewrite our probability of AA completing the task in terms of 1eRBRA\frac{1}{e^{R_B-R_A}}. But let’s abstract this expression with 1x\frac{1}{x} because I am tired of writing everything in LaTeX\LaTeX.

Part 2

11+1x\frac{1}{1+\frac{1}{x}} 1xx+1x\frac{1}{\frac{x}{x}+\frac{1}{x}} 11+xx\frac{1}{\frac{1+x}{x}} x1+x\frac{x}{1+x}

Summing Up

We can now conclude that

11+eRBRA\frac{1}{1+e^{R_B-R_A}}

is (following Part 1) the same as

11+1eRARB\frac{1}{1+\frac{1}{e^{R_A-R_B}}}

which is (following Part 2) the same as

eRARB1+eRARB\frac{e^{R_A-R_B}}{1+e^{R_A-R_B}}

Therefore:

Pr{XAB=1}=11+eRBRA=eRARB1+eRARBPr\{X_{AB} = 1\} = \frac{1}{1+e^{R_B-R_A}} = \frac{e^{R_A-R_B}}{1+e^{R_A-R_B}}

Wow, that was long… But as we can see, the syntax for calculating the probability of the outcome means exactly the same thing as the one in the Elo system!

Who found it first?

After finding out that the two systems share the same internal logic, one may ask “Did either Rasch or Elo copied the other’s system?”

Both systems were invented at the same period, that is, around 1960 with Elo’s first publication on the matter in 1961 (Elo 1961) and Rasch’s in 1960 (Rasch 1980, p. 197). But on inspection of the sources, none of them mentions the other’s work, Elo claims he started working on the question in 1959 (Elo 1986, p. 4). Furthermore, their first publications come as the solutions to problems that have their own history, tracing back many years before these systems were implemented or published the first time. The truth is, those systems may well have been invented independently of each other without nor Rasch nor Elo ever learning about the other’s work…

As a matter of fact, Elo (ibid.) mentions that yet a similar system for chess was independently invented in 1969 by Gyorgy Karoly and Roger Cook for the New South Wales Chess Association. He must have learned about it many years later, solely due to the fact that this other solution tackled the exact same problem, chess players rating.

Conclusion

The only true difference between these systems is the presence of a KK value, which can speed, slow, or nullify changes as it can be tuned based on different factors, such as response time, number of evaluations, the days since last evaluation, combinations of the previous two etc… (Pelánek 2016). This aspect makes it a better suited tool for evaluations in dynamic systems where we want to study progress instead of level.

Now, I wonder how much science or tooling from other fields scientists could successfully implement in their own domains. For example, advertisement and language learning; what if language learning app relied on the same algorithms as those selecting the ads we see every day? But to “feed” the learners with exactly the content they need to keep progressing or stay motivated… What about other fields of education?

References

Elo, A. (1961) ‘The USCF Rating System - A Scientific Achievement’, Chess Life, pp. 160–161.

Elo, A. (1986) The Rating of Chessplayers, Past and Present. Second ed. New York: Arco Publishing, Inc. Available at: https://gwern.net/doc/statistics/order/comparison/1978-elo-theratingofchessplayerspastandpresent.pdf.

Pelánek, R. (2016) ‘Applications of the Elo rating system in adaptive educational systems’, Computers & Education, 98, pp. 169–179. Available at: https://doi.org/10.1016/j.compedu.2016.03.017.

Rasch, G. (Georg) (1980) Probabilistic models for some intelligence and attainment tests. Chicago : University of Chicago Press. Available at: http://archive.org/details/probabilisticmod0000rasc (Accessed: 18 December 2024).

Hedberg, S. and Nasra, S. (2023) Ability Estimation Methods : An Introduction to Item Response Theory and Elo Education Systems. Available at: https://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-214601 (Accessed: 8 December 2024).