Recently, I was watching my cousin Ben play Tetris on his home computer. Being related to the Brewers, Ben has a somewhat obsessive personality, and playing Tetris had become a habit, something to do when there’s nothing else to do. One day, while discussing Tetris, Ben mentioned that all of the high scores on their computer (shared between Ben and his brother Andrew) were between 18,000 and 21,000. “Wow”, I said, “you guys must play tetris a lot”.

This got me thinking: how is it that I made an inference from the narrow range of Ben and Andrew’s high score table to a conclusion about the number of games that have been played? Being an enthusiastic Bayesian (I am trying to avoid the label ‘statistician’), I decided to try and make a probabilistic model of this reasoning. But first, I obtained the actual high score table data:

20919, 20739, 20133, 19743, 19742, 19545, 19367, 19286, 19033, 18864

Call this data D, and let m=10 be the number of scores in the high score table. My aim was to assign a probability distribution for N (the total number of games played) given D. One way of breaking this problem down into simpler parts is to use Bayes’s theorem. Please excuse the extremely compact notation that Bayesians like to use. It is better than writing things like P(X=x|Y=y) all the time.

This means that we can get our answer as long as we can assign a prior distribution p(N) describing what we know about N before taking the high score data into account, and p(D|N), the probability density for the high score table if only we knew that N games were played, as a function of N – called the likelihood. Contrary to conventional “wisdom”, assigning a prior is really easy, it is the likelihood that requires more work.

A highly useful way to proceed is to imagine that you had extra information, perhaps the value of some quantity , and you would know how to assign the likelihood if you knew . In the Tetris context, imagine we knew that that was the mean value of all of the scores (call these scores ) ever achieved on the computer. If we knew this, a good description of our knowledge about all of the scores is to assign independent exponential distributions with mean :

I will not attempt to justify this decision; that is a topic for another day. This is some progress, but it’s not what we need, which is a probability distribution for the high score table only, not the whole sequence of scores. The theory of order statistics (http://en.wikipedia.org/wiki/Order_statistics#Joint_distributions) (warning: contains frequentism) helps here. The probability density for the high scores only can be obtained by the probability that (N-m) scores land below the lowest high score, m land on the high scores, and none in between. Using a multinomial distribution (http://en.wikipedia.org/wiki/Multinomial_distribution), this is

Thus, we have our likelihood, provided we are willing to enlarge our parameter space to include . We can always marginalize it out if we want. With uniform priors for log(N) and log(), the posterior distribution is plotted below (I used Markov Chain Monte Carlo to sample the posterior):

This is a highly correlated distribution, showing that the data can be explained either by there having been a lot of games (a billion!), or the average score being high. Increasing one and decreasing the other can still explain the data, within reason. It is extremely unlikely that Tetris has been played less than 1000 times on that computer, and quite unlikely (about a 1% probability) that N is less than 10,000. If you baulk at the possibility of N > 10^8, you get this information from a source other than the high score table. Either that or this model is too simplistic (which is probably true, but this is a blog post, not a paper – yet). One highly likely possibility is that there is a “ceiling” beyond which even Tetris experts find it really hard to increase their score. This would cause the high scores to bunch up even without N being very large.

I think that the installation of Windows XP that Ben and Andrew use is quite old. 3 years of playing at an average of 10 games per day is about 10^4 games, so a fair bit of that posterior distribution is within a plausible range. Now, since I finally deleted my Windows partition, I play Gnometris, the Tetris clone for the GNOME desktop. For those interested, Tetris can also be played on GNU/Linux using wine. Gnometris is a slightly different game, so the scores aren’t directly comparable with Tetris scores, but there is strong evidence that I play gnometris less than Ben and Andrew play Tetris:

This conclusion, that N is probably between about 30 and 1,000, seems reasonable especially since I reinstalled Ubuntu when Hardy Heron came out in April. Since I am not an expert, it is also unlikely that I am running into any high score ceilings.

on June 9, 2008 at 9:19 pm |PhilWhat do you make of your posteror PDF being peaked at higher mu than Ben’s? If you were going to marginalise it out, what prior would you choose?

on June 10, 2008 at 1:19 am |Brendon BrewerHi Phil

My mu is higher than Ben’s, but that’s only because two games are different. For example, Gnometris has a wider playing area than Tetris (14 vs. 10), and the number of points per line is probably different.

The results plotted above are for uniform priors with respect to log(N) and log(mu). If the result really mattered and I wanted to incorporate more information about N, I would try to find out more about the age of the computer etc. For mu, it’s hard to think of an extra source of information except for perhaps watching a few games and including the non-high scores seen as extra data.

on June 10, 2008 at 7:30 am |lawHey I like tetris!!

What version were they playing? Gotta beat their scores. BTW I am totally not obsessive.

on June 19, 2013 at 3:55 pm |Geoffrey ZhuThis is an interesting post that I liked, However, the image links are broken.