Posted in Mathematics on November 18, 2013|
8 Comments »
Continuing with the probability theory, a quick myth-busting. I touched on this last time, but it comes up often enough to deserve its own post. Recall that rationality requires us to calculate the probability of our theory of interest T given everything we know K. We saw that it is almost always useful to split up our knowledge into data D and background B. These are just labels. In practice, the important thing is that I can calculate the probabilities of D with B and T, so that I can calculate the terms in Bayes’ theorem,
Something to note: in this calculation, we assume that we know that D is true, and yet we are calculating the probability of D. For example, the likelihood . The probability is not necessarily one. So do we know D or don’t we?!
The probability is not simply “what do you reckon about D?”. Jaynes considers the construction of a reasoning robot. You feed information in one slot and, upon request, out comes the probability of any statement you care to ask it about. These probabilities are objective in the sense that any two correctly constructed robots should give the same answer, as should any perfectly rational agent. Probabilities are subjective in the sense that they are relative to what information is fed in. There are no “raw” probabilities . So the probability asks: what probability would the robot assign to D if we fed in only T and B?
Thus, probabilities are conditionals, and in particular the likelihood represents a counterfactual conditional: if all I knew were the background information B and the theory T, what would the probability of D be? These are exactly the questions that every maths textbook sets as exercises: given 10 tosses of a fair coin, what is the probability of exactly 8 heads? We can still ask these questions even after we’ve actually seen 8 heads in 10 coin tosses. It is not the case that the probability of some event is one once we’ve observed that event.
What is true is that, if I’ve observed D, then the probability of D given everything I’ve observed is one. If you feed D into the reasoning robot, and then ask it for the probability of D, it will tell you that it is certain that D is true. Mathematically, p(D|D) = 1.
Read Full Post »
Posted in Mathematics on November 17, 2013|
5 Comments »
I laid down the basics of probability theory and Bayes’ theorem in a previous post. Here’s the story, as a reminder. We have some background information B and some data D. We want to know: what is the probability that some theory T is true, given the background information and the data? We write this as , and expand it using Bayes’ theorem:
However, this “background information” is a little vague. What puts something in the background? How much background do I need to dig up? How do we divide our knowledge into background information B and data D? Does it matter? Is it that background information is being assumed as known, while for the data, as with all measurements, we must acknowledge a degree of uncertainty?
Here’s a few things about background information.
Tell me everything
The posterior views data and background equally
Calculate probabilities of data, with background information
Both background and data are taken as given
You should divide K cleanly
1. Tell me everything
The question is this: given everything I know , what is the probability that some statement is true? The idea is that a rational thinker, in evaluating some statement T, will take into account everything they know. Remember that one of the desiderata of probability theory, taken as a rational approach to reasoning with uncertainty, is that information must not be arbitrarily ignored. In principle, everything we know should be in somewhere. So tell me everything.
In practice, thankfully, irrelevant information can be ignored as it will factor out anyway (Point 5). That gives us the definition of “relevant” in probability theory: a statement is relevant if including it as given changes our probabilities.
2. The posterior views data and background equally
Why, then, have we decided to break up everything we know into “data” and “background” ? (Remember: DB means “both D and B are true”). The reason is that probabilities don’t grow on trees. If we had a black box that handed out posteriors for any information K and theory T we care to think of, then we wouldn’t need to worry about Bayes theorem or background and data. Remember: the whole point of probability identities is to take the probability we want and write it in terms of probabilities we have. (more…)
Read Full Post »
I recently read philosopher of science Tim Maudlin’s book Philosophy of Physics: Space and Time and thought it was marvellous, so I was expecting good things when I came to read Maudlin’s article for Aeon Magazine titled “The calibrated cosmos: Is our universe fine-tuned for the existence of life – or does it just look that way from where we’re sitting?“. I’ve got a few comments. Indented quotes below are from Maudlin’s article unless otherwise noted.
In a weekend?
Theories now suggest that the most general structural elements of the universe — the stars and planets, and the galaxies that contain them — are the products of finely calibrated laws and conditions that seem too good to be true. … The details of these sorts of calculations should be taken with a grain of salt. No one could sit down and rigorously work out an entirely new physics in a weekend.
Two few quick things. “Theories” has a ring of “some tentative, fringe ideas” to the lay reader, I suspect. The theories on which one bases fine-tuning calculations are precisely the reigning theories of modern physics. These are not “entirely new physics” but the same equations (general relativity, the standard model of particle physics, stellar structure equations etc.) that have time and again predicted the results of observations, now applied to different scenarios. I think Maudlin has underestimated both the power of order of magnitude calculations in physics, and the effort that theoretical physicists have put into fine-tuning calculations. For example, Epelbaum and his collaborators, having developed the theory and tools to use supercomputer lattice simulations to investigate the structure of the C12 nucleus, write a few papers (2011, 2012) to describe their methods and show how their cutting-edge model successfully reproduces observations. They then use the same methods to investigate fine-tuning (2013). My review article cites upwards of a hundred papers like this. This is not a back-of-the-envelope operation, not starting from scratch, not entirely new physics, not a weekend hobby. This is theoretical physics.
Telling your likelihood from your posterior
It can be unsettling to contemplate the unlikely nature of your own existence … Even if your parents made a deliberate decision to have a child, the odds of your particular sperm finding your particular egg are one in several billion. … after just two generations, we are up to one chance in 10^27. Carrying on in this way, your chance of existing, given the general state of the universe even a few centuries ago, was almost infinitesimally small. You and I and every other human being are the products of chance, and came into existence against very long odds.
The slogan I want to invoke here is “don’t treat a likelihood as if it were a posterior”. That’s a bit to jargon-y. The likelihood is the probability of what we know, assuming that some theory is true. The posterior is the reverse – the probability of the theory, given what we know. It is the posterior that we really want, since it reflects our situation: the theory is uncertain, the data is known. The likelihood can help us calculate the posterior (using Bayes theorem), but in and of itself, a small likelihood doesn’t mean anything. The calculation Maudlin alludes to above is a likelihood: what is the probability that I would exist, given that the events that lead to my existence came about by chance? The reason that this small likelihood doesn’t imply that the posterior – the probability of my existence by chance, given my existence – is small is that the theory has no comparable rivals. Brendon has explained this point elsewhere. (more…)
Read Full Post »