Empirical Rabbit: Easy Server Rating

The purpose of this article is to discuss a very simple problem server rating method, so simple in fact that the user could check his rating with a pencil and paper. I am not suggesting that this method is better than the one that I recommend in More Rating, Time and Score, but it is certainly interesting. The method has four easy steps:

(1). For every problem tackled, find the time adjusted problem rating by adding the rating time adjustment given by the graph below to the problem rating.

(N.B. The x-axis of this graph is the number of seconds taken by the user tackling the problem concerned.)

(2). For every problem successfully solved, add 400 points to the time adjusted problem rating. Discard any problems for which the resulting rating is less than the user’s current rating.

(3). For every unsuccessful attempt to solve a problem, subtract 400 points from the time adjusted problem rating. Discard any problems for which the resulting rating is more than the user’s current rating.

(4). Find the average of the results that were not discarded from steps (3) and (4). This average is the new estimate for the user's rating.

For ease of pencil and paper calculation, we could use a table of values in place of the graph. I have assumed that target solution time at which the user and problem ratings are valid is 30 seconds. (N.B. The average solution time on Chess Tempo is about 30 seconds.) I have also assumed that the user’s effective rating increases by 125 points for every doubling of the time that he spends on a problem (see the earlier article).

We can better understand this calculation by looking at the extreme values:

* For a very rapid successful solution, we add a time bonus of up to 400 points to the problem rating, and also add a success bonus of 400 points.

* For a very slow successful solution, we subtract a time penalty of up to 400 points from the problem rating, and add a success bonus of 400 points. If the result is less than the user’s rating, we discard the result.

* For a very rapid failure, we add a time penalty of up to 400 points to the problem rating, and subtract a failure penalty of 400 points. The result should be less than the user’s rating if he is scoring over 50%. If the result is more than his rating, we discard this result.

* For a very slow failure, we subtract a time penalty of 400 points from the problem rating, and also subtract a failure penalty of 400 points. The result should again be less than the user’s rating if he is scoring more than 50%. If the result is less than the user’s rating, we discard the result.

The corresponding method for calculating the rating of a problem from user ratings is similar to that for calculating a user rating from problem ratings. When we rate a problem rather than a user, the time adjustment has the same magnitude but the opposite sign (see the earlier article). The success/failure adjustment for each user also has the same magnitude and the opposite sign.

The method above is essentially the same as that I recommended previously, except that a linear approximation is used for the logistic distribution. The graph below shows the logistic distribution (used by the USCF) in red, the normal distribution (used by FIDE) in blue, and a linear approximation (used by the ECF) in yellow:

The normal distribution used by FIDE is the cumulative normal distribution with standard deviation 282.84, see:

http://www.chessbase.com/newsdetail.asp?newsid=4326.

The logistic distribution used by the USCF has the equation:

1/(1+10^[-d/400]

Where d is the rating difference between the user and the problem that he tackles.

The linear approximation used by the ECF has the equation:

0.5 + d/800

(N.B. I have converted ECF points to Elo points. 50 ECF points is equivalent to 400 Elo points.)

There is clearly very little difference between the logistic and normal distributions over the range of the graph. The linear approximation is accurate up to an expected score of about 0.85, or a rating difference of 300 points. The histogram below shows the distribution of success probabilities for Chess Tactics Server (CTS):

Most of the CTS users have success rates (and therefore expected scores) between 50% and 90%. The linear approximation overestimates the expected score and underestimates the user’s rating for scores of more than 85%. However, users could probably be dissuaded from scoring over 85% by the promise of miserly ratings.

Let:

R = user’s rating.
T = the target time at which the user and problem ratings are valid.
K = number of rating points that the user gains for each doubling of his clock time.
N = number of problems in the problem set.
R_i = rating of problem i.
s_i = user’s score on problem i.
t_i = time that the user takes on problem i.

As in the previous article, the time adjusted rating difference between the user and problem i is given by:

d_i = R - R_i + K*log₂(t_i/T)

The user’s expected score on the problem set is:

Σ_i {0.5 + d_i/800}

The user’s actual score on the problem set is:

Σ_i {s_i}

Equating the user’s expected and actual scores gives:

Σ_i {0.5 + d_i/800} = Σ_i {s_i}

Σ_i {d_i} = 800 * Σ_i {s_i - 0.5}

Substituting for d_i gives:

Σ_i {R - R_i  + K*log₂(t_i/T)} = 800 * Σ_i {s_i - 0.5}

N*R = Σ_i {R_i  - K*log₂(t_i/T) + 800*(s_i - 0.5)}

= Σ_i {R_i + 800*(si - 0.5)} + Σ_i {K*log₂(t_i/T)}

Let S be the set of indices for the problems that the user solved correctly, and F be the set of indices for the problems that he solved incorrectly. We can then rewrite the first term as:

Σ_i {R_i + 800*(s_i - 0.5)} =

Σ_i∈S {R_i + 400} + Σ_i∈F {R_i  - 400}

We can rewrite the second term as:

Σ_i {K*log₂(t_i/T)} = N*K*log₂(t_GM/T)

Where t_GM is the geometric mean of the t_i values.

Alternatively, we have:

N*R = Σ_i {R_i + 800*(s_i - K*log₂(t_i/T)/800 - 0.5)}

We can interpret -K*log₂(t_i/T)/800 as a time adjustment to the user’s score on problem i.

The graph below shows the time adjustment to the score for K=125, and T=30:

On CTS, the user gets 1 point for a very rapid successful solution and 0 points for a very slow one. The adjusted score here is up to 1.5 points for a very rapid success and down to 0.5 points for a very slow success. On CTS, the user gets 0 points for a failure, irrespective of the time that he takes. The adjusted score here is 0.5 points for a very rapid failure, and down to -0.5 points for a very slow failure. If the user takes the target time, there is no time adjustment, and the user gets 1 for a success and 0 for a failure.

In the method above, I have added rules to ensure that the user cannot benefit from a fast failure, or lose from a slow success. (N.B. A slow solution time just increases the time adjusted problem rating.) The ECF goes further and ensures that a player always gains from a win, even against a beginner. In theory, it is possible to obtain an infinite ECF grade by beating a constant stream of beginners. Clearly, this adjustment is motivated more by politics than by statistics!

2 comments:

Anonymous23 January 2013 at 11:33
Having read through all of your posts - all tremendous - I'd be curious to hear your summary of what you recommend, with specific first book, as training regimen. Bain's book? Dividing up as you suggest and according to the timeframe you used in the Bain Experiment? It would be worth a post, I think, where you explain, of all you've learned, what you now consider the optimal training regime for a new chess player.

Thanks for the great work.
Geoff Fergusson23 January 2013 at 23:03
I will put a post together when I have time.

Tuesday 1 January 2013

Easy Server Rating

2 comments: