I’m (Berian, that is, rather than Luke) attending the SF Data Science Summit today and tomorrow. I’m taking some rough notes as I go and want to publish them in digestible bits. One of the speakers I most enjoyed today was Carlos Guestrin (@guestrin), who gave a keynote and then a little 25-minute appendix later in the day. Here’s what I wrote down.
Archive for the ‘Statistics and Metrics’ Category
Notes from Data Science Summit: ML in Production
Posted in Data Science, Machine Learning, Statistics and Metrics, Technology on July 13, 2016| 1 Comment »
How to win at the races
Posted in Mathematics, Statistics and Metrics on November 4, 2012| Leave a Comment »
I’ve rambled about this before, but with the Melbourne Cup – “the race that stops a nation” – a few days away and Tom Waterhouse’s annoying face on TV too often, it’s worth repeating.
Don’t bet on the horse you think will win!
More precisely, don’t necessarily bet on the horse you think will win. Here is the only betting system that works:
- For each horse in the race, and before you look at the price offered by the bookmaker, write what you think the probability (as a percentage) is that the horse will win. I.e. if the race was run 100 times, how many times would this horse win? You’ll have to do your homework on the field.
- For each horse, take your probability and multiply it by the bookmakers price. Call that the magic number.
- If any of the horses have a magic number greater than 100, bet on the horse with the highest magic number.
- If none of the horses have a magic number greater than 100, don’t bet. Go home.
The magic number is how much (on average) you would make if you bet $1 on the horse 100 times, so it better be more than 100. The way that the bookmaker guarantees that they will make a profit in the long run is to ensure that no magic numbers are greater than 100. Because of the bookmakers slice (the overround), the odds are stacked against the average punter. You will only end up with a magic number greater than 100 if either you have made a mistake on step 1, or the bookmaker has made a mistake on his price. This leads to the following advice.
You should only bet on a horse if
a) You know more than the bookmaker, and
b) The bookmaker has significantly underestimated one of the horses.
Thus, the better the bookmaker, the more reason not to bet. And so, we come to Tom Waterhouse’s online betting business:
“I’ve got four generations of betting knowledge in my blood. … Bet with me, and that knowledge can be yours.”
This is exactly the information you need to conclude that you should never bet with Tom Waterhouse. The ad might as well say “bet with me; I know how to take your money”. You don’t want a bookmaker who knows horse racing inside-and-out, from horse racing stock, armed will all the facts, knowing all the right people. You don’t want a professional in a sharp suit surrounded by a analysts at computer screens. You want an idiot. You want someone who doesn’t know which end of the horse is the front, armed with a broken abacus and basing his prices on a combination of tea-leaf-reading, a lucky 8-ball and “the vibe“. You want a bookmaker that is going out of business.
The more successful the bookmaker, the further you should stay away. The TAB was established in 1964, has over a million customers, 2,500 retail outlets, and made a profit of $534.8 million in 2011, up 14%. Translation: never bet with the TAB. Betfair’s profits were $600 million, SportingBet made $2 billion in 2009. With those resources, they’ll always know more than you. If you’ve heard of them, don’t bet with them. Go home.
Hopefully you’re getting my point. Don’t bet on sports. If you go to the races, put on a nice outfit, drink a few beers and give the money to charity. If you must bet, have a random sweepstakes with your friends. You’ll get much better odds that way.
Coincidences and the Lottery
Posted in Statistics and Metrics, Uncategorized on November 29, 2010| 3 Comments »
Coincidences happen surprisingly often. Yet they are often not meaningful, i.e. they are “just a coincidence” and do not imply that we should change our worldview. For example, suppose there are a million people in contention for a lottery, and John Smith is found to win. Before knowing this, our probability for it is :
People often get afraid of this tiny probability, and proclaim something like “it’s not the probability of John Smith winning the lottery that is relevant, but the probability that someone wins”. However, this is anti-Bayesian nonsense. This tiny probability is, by Bayes’ rule, relevant for getting a posterior probability for . So how is it that we often still believe in the fair lottery (or that a coincidence is not meaningful)?
The answer is quite simple: the likelihood for the alternative, hypothesis, is just as small:
.
The reason is that before we knew who won, we had no reason to single out John Smith, and had to spread the total probability (1) over a million minus one alternatives (that the lottery was rigged in favor of one of the other entrants). Using analogous reasoning, yes, coincidences have tiny probability, but they also have tiny probability given the hypothesis of a mysterious force operating, because before the coincidence happened we didn’t know which of the multitude of coincidences were going to occur.
For more on this topic, you may be interested in this paper (by myself and Matt).
Conjecture of the evening
Posted in Amusing, Creativity, Mathematics, Statistics and Metrics, tagged h-index, polygons on November 15, 2010| Leave a Comment »
Especially for Cusp, I note the following (proof left for undergraduates):
(Convex h-index conjecture) For n chronologically distinct papers, each of which cites all previous papers, the corresponding h-index is the number of non-congruent diagonals in a regular polygon with number of sides 2 greater than n.
As a corollary, academics engaging in such cheeky behaviour may be indexed with the dimension of their corresponding polygon.
Surprising Statistic of the Day
Posted in Statistics and Metrics, tagged alcohol, crime, statistics, violence on October 16, 2010| Leave a Comment »
From the Sydney Morning Herald:
Alcohol plays a role in 50 to 60 per cent of the nearly 300,000 criminal cases that come before the state’s Local Courts each year, [New South Wales] Chief Magistrate Graeme Henson said.
That’s about twice as high as I’d have guessed. I tried to track down the source of this statistic, but the closest I could find was a report called “Alcohol related crime for each NSW Local Government Area: Numbers, proportions, rates, trends and ratios” from the NSW Bureau of Crime Statistics and Research. The report gives the percentage of “incidents of non-domestic violence related assault recorded by NSW Police” that are alcohol related as 45%.
I’d love to know what that number is for the United Kingdom, as well as European countries like France or Germany who seem to have an alcohol culture without having as much of a binge drinking culture. I’d expect that the percentage of alcohol related crime was lower for the UK and even lower for most of Europe. I’ll try to track those down.
As to what should be done about the problem, I have no idea. Perhaps nothing – it may be a correlation without causation. Perhaps its an alpha male thing: put too many young men in a nightclub with available women and testosterone will cause friction. The alcohol just happened to be there as well. On the other hand, the anecdotal evidence that certain people are more likely to “kick off after having a few” is well known.
A Tale of Two Entropies
Posted in logic, Statistics and Metrics on July 14, 2010| 4 Comments »
For those of us who work with degree-of-plausibility (“Bayesian”) probabilities, two situations regularly arise. The first is the need to update probabilities to take into account new information. This is usually done using Bayes’ Rule, when the information comes in the form of a proposition that is known to be true. An example of such a proposition is “The data are 3.444, 7.634, 1.227”.
More generally, information is any justified constraint on our probabilities. For example, “P(x > 3) should be 0.75” is information. If our current probability distribution doesn’t satisfy the constraint, then we better change to a new distribution
that does. This doesn’t mean that any old
will do – our
contained hard-won information and we want to preserve that. To proceed, we choose the
that is as close as possible to
, but satisfies the constraint. Various quite persuasive arguments (see here) suggest that the correct notion of closeness that we should maximise is the relative entropy:
With no constraints, the best possible is equal to
.
Another situation that arises often is the need to simplify complex problems. For example, we might have some probability distribution that is non-Gaussian, but for some reason we only want to use Gaussians for the rest of the calculation, perhaps for presentation or computational reasons. Which Gaussian should we choose to become our
? Many people recommend maximising the relative entropy for this also: in the literature, this is known as a variational approximation, variational Bayes, or the Bogoliubov approximation (there are also variations (pun not intended) on this theme).
There are known problems with this technique. For instance, as David MacKay notes, the resulting probability distribution is usually narrower than the original
. This makes sense, since the variational approximation basically amounts to pretending you have information that you don’t actually have. This issue raises the question of whether there is something better that we could do.
I suggest that the correct functional to maximise in the case of approximating one distribution by another is actually the relative entropy, but with the two distributions reversed:
Why? Well, for one, it just works better in extreme examples I’ve concocted to magnify (a la Ed Jaynes) the differences between using and
. See the figure below:
If the blue distribution represented your actual state of knowledge, but out of necessity you could only use the red or the green distribution, which would you prefer? I find it very hard to imagine an argument that would make me choose the red distribution over the green. Another argument supporting the use of this ‘reversed’ entropy is that it is equivalent to generating a large number of samples from q, and then doing a maximum likelihood fit of p to these samples. I know maximum likelihood isn’t the best, most principled thing in the world, but in the limit of a large number of samples it’s pretty hard to argue with.
A further example supporting the ‘reversed’ entropy is what happens if is zero at some points. According to the regular entropy, any distribution
that is nonzero where
is zero, is infinitely bad. I don’t think that’s true, in the case of approximations – some leakage of probability to values we know are impossible is no catastrophe. This is manifestly different to the case where we have legitimate information – if
is zero somewhere then of course we want to have
zero there as well. If we’re updating probabilities, we’re trying to narrow down the possibilities, and resurrecting some is certaintly unwarranted – but the goal in doing an approximation is different.
Maximising the reversed entropy also has some pretty neat properties. If the approximating distribution is a Gaussian, then the first and second moments should be chosen to match the moments of . If the original distribution is over many variables, but you want to approximate it by a distribution where the variables are all independent, just take all of the marginal distributions and product them together, and there’s your optimal approximation.
If isn’t the best thing to use for approximations, that means that something in the derivation of
applies to legitimate information but does not apply to approximations. Most of the axioms (coordinate independence, consistency for independent systems, etc) make sense, and both entropies discussed in this post satisfy those. It is only at the very end of the derivation that the reversed entropy is ruled out, and by some pretty esoteric arguments that I admit I don’t fully understand. I think the examples I’ve presented in this post are suggestive enough that there is room here for a proof that the reversed entropy
is the thing to use for approximations. This means that maximum relative entropy is a little less than universal, but that’s okay – the optimal solutions to different problems are allowed to be different!
Homogeneity, features
Posted in Statistics and Metrics, The Universe, tagged gammaLambda, homogeneity, integral constraint on March 20, 2010| Leave a Comment »
Yesterday I read a few of the recent papers of Francesco Sylos Labini, who has pursued a distinction between the common or garden type statistical homogeneity in the Universe that one reads about in textbooks, and a stronger form (‘super-homogeneity’) in which the mass fluctuations follow a behaviour that is sub-Poisson as a function of scale. This implies a sort of anti-correlation—a lattice of points is, for instance, sub-Poisson, as the points are deliberately avoiding one another—and has consequences for the form of the two-point correlation function:
that look remarkably similar to those imposed by the integral constraint, but which are, in fact, quite different—the super-homogeneity condition affects the actual correlation function, while the correction usually referred to as the integral constraint affects estimators of the correlation function. I started writing a summary document on this topic for the reference of myself and others.
After DARK’s infamous session, I hit a sweet spot in coding productivity and wrote a bunch of scripts to extract spatial features from galaxy images, along lines suggested to me a week or so ago by Andrew Zirm. These features are extracted from a matrix that encodes the frequency of adjacency between threshold intensity levels in the image. It’s the sort of thing best shown with pictures, which perhaps I can post once Andrew has decided which direction to pursue next.
WMAP 7 cosmological parameter set
Posted in Statistics and Metrics, The Universe on January 29, 2010| 1 Comment »
Your Universe ca. 2010, per the WMAP+BAO+H0[1] maximum likelihood parameter set:
Parameter | WMAP+BAO+H0 ML | |||||
---|---|---|---|---|---|---|
Hubble parameter | h | 0.702 | H0 | 70.2 km/s/Mpc | ||
Dark matter density | Ωch2 | 0.1120 | Ωc | 0.227 | ||
Baryonic matter density | Ωbh2 | 0.02246 | Ωb | 0.0455 | ||
Total matter density | Ωmh2 | 0.1344 | Ωm | 0.272 | ||
Vacuum tension[2] | ΩΛ | 0.728 | ||||
Amplitude of curvature perturbation at k = 0.002/Mpc |
Δ2R | 2.45 x 10-9 | ||||
Spectral index of density perturbations |
ns | 0.961 | ||||
Size of linear density fluctuation at 8 Mpc/h |
σ8 | 0.807 | ||||
Redshift of matter– radiation equatlity |
zeq | 3196 | ||||
Age of the Universe | t0 | 13.78 Gyr |
Parameters fit directly from the data are shown in a slightly different colour; all the others have been derived from the fit parameters using the usual definitions. The determination of zeq is carried out using the WMAP 7-year data on its own. The two papers in which these figures are given are:
Larson et al. (2010), arXiv:1001.4635
Komatsu et al. (2010), arXiv:1001.4538
These papers contain many other numbers: in particular, for extensions to ΛCDM cosmology, such as neutrino species, non-zero spatial curvature and dark energy that is not the cosmological constant. I expect some of the parameters mentioned there and not here—particularly the fNLstatistics of non-Gaussianity—to gain more public attention in the next decade as observations begin to determine the properties of the cosmological inflation that occurred in the very early Universe.
A final note: I’ve written this post only because these numbers are not written on an actual webpage—they are all in pdf or postscript files. But, it also gives me a chance to congratulate the WMAP team on their ongoing achievement.
Footnotes
1. Riess, A. et al. (2009), ApJ 699 539, arXiv:0905.0695
2. Dark energy, or, as assumed here, the cosmological constant.
Genus analogues
Posted in Mathematics, Physics, Statistics and Metrics, The Universe on January 26, 2010| 6 Comments »
N.b. This is a technical post, written to illustrate a question I believe to be interesting to some colleagues outside my particular discipline. I am accutely aware of its shortcomings as expository work, and pedagogical criticism is almost as welcome as an attempt to engage with the question at hand.
Small coincidences
Posted in Science, Statistics and Metrics, The Universe on January 16, 2010| Leave a Comment »
It’s a new decade, & I’m well rested after a week locked inside the Iberostar Resortcatraz1, so there is no better time for a rejuvenation of the compact between blogger and blogosphere the mathematical space of readers and writers of blogs.
But I’ll ease myself back in with a trivium amusing to perhaps one person only. As we know, I enjoy Andrew Sullivan’s writing, & one gimmick of his blog is the View From Your Window2 snapshot series. I enjoy looking at all these different images, but am yet to bother sending anything in because of a well-known aversion to using cameras myself (Luke is much, much better at that sort of thing). But wait: here is one from Jan 8:
Awww. It’s nice to know that there are at least two people in Denmark reading Andrew’s blog. The round tower in the background is none other than the Round Tower of Tycho Brahe, who built it after he was voted off the island where his more famous observations were made. The inside is an ascending spiral—not of steps, but a smooth cobbled road, apparently so that the astronomical instruments could be carted up by horse. It’s very interesting!
I left Copenhagen two days after this photo was taken. I’ve had a nice time here is Mexico, although I’ve been mostly cut off from that taproot of Western Civilization you and I call the Internet. My flight into SFO is in a few hours—and what do I find today on Andrew’s blog?
So, I welcome the Internet back to my inertial frame. I’ll be staying at a place in these very same Berkeley Hills for the next little while and working in the Astronomy Department at UCB. May the bright colours of a new place forming its first impression on the mind provide much for me to write about.
1. I would have thrown myself into the carnivorous tortoise pen after the first day had there not been a cosmology conference to attend—and it was a very good meeting indeed. So a shout out to everyone who made it along. As per the request of the organisers, I draw everyone’s attention to it. There’ll be another one next year!
2. Andrew, Patrick & Chris have put together a book of these windows, selected by the readership from the many, many photos that have been sent in since 2005.