Feeds:
Posts

## Probability Myth: we’ve observed X, so the probability of X is one

Continuing with the probability theory, a quick myth-busting. I touched on this last time, but it comes up often enough to deserve its own post. Recall that rationality requires us to calculate the probability of our theory of interest T given everything we know K. We saw that it is almost always useful to split up our knowledge into data D and background B. These are just labels. In practice, the important thing is that I can calculate the probabilities of D with B and T, so that I can calculate the terms in Bayes’ theorem, $p(T | DB) = \frac{p(D | TB) p(T | B)} {p(D | B)}$

Something to note: in this calculation, we assume that we know that D is true, and yet we are calculating the probability of D. For example, the likelihood $p(D | TB)$. The probability is not necessarily one. So do we know D or don’t we?!

The probability $p(D | TB)$ is not simply “what do you reckon about D?”. Jaynes considers the construction of a reasoning robot. You feed information in one slot and, upon request, out comes the probability of any statement you care to ask it about. These probabilities are objective in the sense that any two correctly constructed robots should give the same answer, as should any perfectly rational agent. Probabilities are subjective in the sense that they are relative to what information is fed in. There are no “raw” probabilities $p(D)$. So the probability $p(D | TB)$ asks: what probability would the robot assign to D if we fed in only T and B?

Thus, probabilities are conditionals, and in particular the likelihood represents a counterfactual conditional: if all I knew were the background information B and the theory T, what would the probability of D be? These are exactly the questions that every maths textbook sets as exercises: given 10 tosses of a fair coin, what is the probability of exactly 8 heads? We can still ask these questions even after we’ve actually seen 8 heads in 10 coin tosses. It is not the case that the probability of some event is one once we’ve observed that event.

What is true is that, if I’ve observed D, then the probability of D given everything I’ve observed is one. If you feed D into the reasoning robot, and then ask it for the probability of D, it will tell you that it is certain that D is true. Mathematically, p(D|D) = 1.

## Bayes’ Theorem: what is this “background information”?

I laid down the basics of probability theory and Bayes’ theorem in a previous post. Here’s the story, as a reminder. We have some background information B and some data D. We want to know: what is the probability that some theory T is true, given the background information and the data? We write this as $p(T | DB)$, and expand it using Bayes’ theorem: $p(T | DB) = \frac{p(D | TB) p(T | B)} {p(D | B)}$

However, this “background information” is a little vague. What puts something in the background? How much background do I need to dig up? How do we divide our knowledge into background information B and data D? Does it matter? Is it that background information is being assumed as known, while for the data, as with all measurements, we must acknowledge a degree of uncertainty?

Here’s a few things about background information.

1. Tell me everything

2. The posterior views data and background equally

3. Calculate probabilities of data, with background information

4. Both background and data are taken as given

5.  You should divide K cleanly

### 1. Tell me everything

The question is this: given everything I know $K$, what is the probability that some statement $T$ is true? The idea is that a rational thinker, in evaluating some statement T, will take into account everything they know. Remember that one of the desiderata of probability theory, taken as a rational approach to reasoning with uncertainty, is that information must not be arbitrarily ignored. In principle, everything we know should be in $K$ somewhere. So tell me everything.

In practice, thankfully, irrelevant information can be ignored as it will factor out anyway (Point 5). That gives us the definition of “relevant” in probability theory: a statement is relevant if including it as given changes our probabilities.

### 2. The posterior views data and background equally

Why, then, have we decided to break up everything we know $K$ into “data” and “background” $DB$? (Remember: DB means “both D and B are true”). The reason is that probabilities don’t grow on trees. If we had a black box that handed out posteriors $p(T | K)$ for any information K and theory T we care to think of, then we wouldn’t need to worry about Bayes theorem or background and data. Remember: the whole point of probability identities is to take the probability we want and write it in terms of probabilities we have. (more…)

## Reply to Maudlin: The Calibrated Cosmos

I recently read philosopher of science Tim Maudlin’s book Philosophy of Physics: Space and Time and thought it was marvellous, so I was expecting good things when I came to read Maudlin’s article for Aeon Magazine titled “The calibrated cosmos: Is our universe fine-tuned for the existence of life – or does it just look that way from where we’re sitting?“. I’ve got a few comments. Indented quotes below are from Maudlin’s article unless otherwise noted.

### In a weekend?

Theories now suggest that the most general structural elements of the universe — the stars and planets, and the galaxies that contain them — are the products of finely calibrated laws and conditions that seem too good to be true. … The details of these sorts of calculations should be taken with a grain of salt. No one could sit down and rigorously work out an entirely new physics in a weekend.

Two few quick things. “Theories” has a ring of “some tentative, fringe ideas” to the lay reader, I suspect. The theories on which one bases fine-tuning calculations are precisely the reigning theories of modern physics. These are not “entirely new physics” but the same equations (general relativity, the standard model of particle physics, stellar structure equations etc.) that have time and again predicted the results of observations, now applied to different scenarios. I think Maudlin has underestimated both the power of order of magnitude calculations in physics,  and the effort that theoretical physicists have put into fine-tuning calculations. For example, Epelbaum and his collaborators, having developed the theory and tools to use supercomputer lattice simulations to investigate the structure of the C12 nucleus, write a few papers (2011, 2012) to describe their methods and show how their cutting-edge model successfully reproduces observations. They then use the same methods to investigate fine-tuning (2013). My review article cites upwards of a hundred papers like this. This is not a back-of-the-envelope operation, not starting from scratch, not entirely new physics, not a weekend hobby. This is theoretical physics.