Feeds:
Posts

## Bayes Theorem: Certainty Starts in Here

Continuing on my series on Bayes’ Theorem, recall that the question of any rational investigation is this: what is the probability of the theory of interest T, given everything that I know K? Thanks to Bayes’ theorem, we can take this probability $p(T | K)$ and break it into manageable pieces. In particular, we can divide K into background information B and data D. Remember that this is just convenience, and in particular that B and D are both assumed to be known.

Suppose one calculates $p(T | DB)$ for some theory, data and background information. Think of it as a practice problem in a textbook. This calculation, in and of itself, knows nothing of the real world. So what follows? We can think of the probability as a conditional if-then statement:

1. If DB, then the probability of T is $p(T | DB)$.

To draw a conclusion from this, we must add the premise.

2. DB.

Only then can we conclude,

3. The probability of T is $p(T | DB)$.

But wait a minute … the whole point of this exercise was to reason in the face of uncertainty. Where do we get the nerve to simply assert 2, that DB is true? Where is the inevitable uncertainty of measurement? Isn’t treating the data as certain hopelessly idealized? Shouldn’t we take into account how probable DB is? But there are no raw probabilities, so with respect to what should we calculate the probability of DB? We’re headed for an infinite regress if we keep asking for probabilities. How do we get premise 2? Are probabilities all merely hypothetical?

## Probability Myth: we’ve observed X, so the probability of X is one

Continuing with the probability theory, a quick myth-busting. I touched on this last time, but it comes up often enough to deserve its own post. Recall that rationality requires us to calculate the probability of our theory of interest T given everything we know K. We saw that it is almost always useful to split up our knowledge into data D and background B. These are just labels. In practice, the important thing is that I can calculate the probabilities of D with B and T, so that I can calculate the terms in Bayes’ theorem,

$p(T | DB) = \frac{p(D | TB) p(T | B)} {p(D | B)}$

Something to note: in this calculation, we assume that we know that D is true, and yet we are calculating the probability of D. For example, the likelihood $p(D | TB)$. The probability is not necessarily one. So do we know D or don’t we?!

The probability $p(D | TB)$ is not simply “what do you reckon about D?”. Jaynes considers the construction of a reasoning robot. You feed information in one slot and, upon request, out comes the probability of any statement you care to ask it about. These probabilities are objective in the sense that any two correctly constructed robots should give the same answer, as should any perfectly rational agent. Probabilities are subjective in the sense that they are relative to what information is fed in. There are no “raw” probabilities $p(D)$. So the probability $p(D | TB)$ asks: what probability would the robot assign to D if we fed in only T and B?

Thus, probabilities are conditionals, and in particular the likelihood represents a counterfactual conditional: if all I knew were the background information B and the theory T, what would the probability of D be? These are exactly the questions that every maths textbook sets as exercises: given 10 tosses of a fair coin, what is the probability of exactly 8 heads? We can still ask these questions even after we’ve actually seen 8 heads in 10 coin tosses. It is not the case that the probability of some event is one once we’ve observed that event.

What is true is that, if I’ve observed D, then the probability of D given everything I’ve observed is one. If you feed D into the reasoning robot, and then ask it for the probability of D, it will tell you that it is certain that D is true. Mathematically, p(D|D) = 1.

## Bayes’ Theorem: what is this “background information”?

I laid down the basics of probability theory and Bayes’ theorem in a previous post. Here’s the story, as a reminder. We have some background information B and some data D. We want to know: what is the probability that some theory T is true, given the background information and the data? We write this as $p(T | DB)$, and expand it using Bayes’ theorem:

$p(T | DB) = \frac{p(D | TB) p(T | B)} {p(D | B)}$

However, this “background information” is a little vague. What puts something in the background? How much background do I need to dig up? How do we divide our knowledge into background information B and data D? Does it matter? Is it that background information is being assumed as known, while for the data, as with all measurements, we must acknowledge a degree of uncertainty?

Here’s a few things about background information.

1. Tell me everything

2. The posterior views data and background equally

3. Calculate probabilities of data, with background information

4. Both background and data are taken as given

5.  You should divide K cleanly

### 1. Tell me everything

The question is this: given everything I know $K$, what is the probability that some statement $T$ is true? The idea is that a rational thinker, in evaluating some statement T, will take into account everything they know. Remember that one of the desiderata of probability theory, taken as a rational approach to reasoning with uncertainty, is that information must not be arbitrarily ignored. In principle, everything we know should be in $K$ somewhere. So tell me everything.

In practice, thankfully, irrelevant information can be ignored as it will factor out anyway (Point 5). That gives us the definition of “relevant” in probability theory: a statement is relevant if including it as given changes our probabilities.

### 2. The posterior views data and background equally

Why, then, have we decided to break up everything we know $K$ into “data” and “background” $DB$? (Remember: DB means “both D and B are true”). The reason is that probabilities don’t grow on trees. If we had a black box that handed out posteriors $p(T | K)$ for any information K and theory T we care to think of, then we wouldn’t need to worry about Bayes theorem or background and data. Remember: the whole point of probability identities is to take the probability we want and write it in terms of probabilities we have. Continue Reading »

## Reply to Maudlin: The Calibrated Cosmos

I recently read philosopher of science Tim Maudlin’s book Philosophy of Physics: Space and Time and thought it was marvellous, so I was expecting good things when I came to read Maudlin’s article for Aeon Magazine titled “The calibrated cosmos: Is our universe fine-tuned for the existence of life – or does it just look that way from where we’re sitting?“. I’ve got a few comments. Indented quotes below are from Maudlin’s article unless otherwise noted.

### In a weekend?

Theories now suggest that the most general structural elements of the universe — the stars and planets, and the galaxies that contain them — are the products of finely calibrated laws and conditions that seem too good to be true. … The details of these sorts of calculations should be taken with a grain of salt. No one could sit down and rigorously work out an entirely new physics in a weekend.

Two few quick things. “Theories” has a ring of “some tentative, fringe ideas” to the lay reader, I suspect. The theories on which one bases fine-tuning calculations are precisely the reigning theories of modern physics. These are not “entirely new physics” but the same equations (general relativity, the standard model of particle physics, stellar structure equations etc.) that have time and again predicted the results of observations, now applied to different scenarios. I think Maudlin has underestimated both the power of order of magnitude calculations in physics,  and the effort that theoretical physicists have put into fine-tuning calculations. For example, Epelbaum and his collaborators, having developed the theory and tools to use supercomputer lattice simulations to investigate the structure of the C12 nucleus, write a few papers (2011, 2012) to describe their methods and show how their cutting-edge model successfully reproduces observations. They then use the same methods to investigate fine-tuning (2013). My review article cites upwards of a hundred papers like this. This is not a back-of-the-envelope operation, not starting from scratch, not entirely new physics, not a weekend hobby. This is theoretical physics.

It can be unsettling to contemplate the unlikely nature of your own existence … Even if your parents made a deliberate decision to have a child, the odds of your particular sperm finding your particular egg are one in several billion. … after just two generations, we are up to one chance in 10^27. Carrying on in this way, your chance of existing, given the general state of the universe even a few centuries ago, was almost infinitesimally small. You and I and every other human being are the products of chance, and came into existence against very long odds.

The slogan I want to invoke here is “don’t treat a likelihood as if it were a posterior”. That’s a bit to jargon-y. The likelihood is the probability of what we know, assuming that some theory is true. The posterior is the reverse – the probability of the theory, given what we know. It is the posterior that we really want, since it reflects our situation: the theory is uncertain, the data is known. The likelihood can help us calculate the posterior (using Bayes theorem), but in and of itself, a small likelihood doesn’t mean anything. The calculation Maudlin alludes to above is a likelihood: what is the probability that I would exist, given that the events that lead to my existence came about by chance? The reason that this small likelihood doesn’t imply that the posterior – the probability of my existence by chance, given my existence – is small is that the theory has no comparable rivals. Brendon has explained this point elsewhere. Continue Reading »

## 10 Nice things about Bayes’ theorem

As background for some future posts, I need to catalogue a few facts about Bayes’ theorem. This is all standard probability theory. I’ll be roughly following the discussion and notation of Jaynes.

### Probability

We start with propositions, represented by A, B, etc. These propositions are in fact true or false, though we may not know which. We assume that we can do Boolean things with these propositions, in particular:

• Conjunction: $AB$ means “both A and B are true”

• Disjunction: $A+B$ means “at least one of the propositions A, B is true”

• Negation: $\bar{A}$ means “not A” (i.e. A is false)

We want to assign probabilities to propositions to represent how likely they are to be true, given the information in other propositions. The function $p(A | B)$ assigns a probability to the proposition $A$, using only the information in $B$. Read “p” as “the probability of” and the vertical bar “ | ” as “given”, so $p(A | B)$ reads “the probability of A given B”. There are no “raw” probabilities, $p(A)$.

What is p? The older approach to probability of Kolmogorov et al. requires that P satisfy certain axioms. Roughly,

A1. $p(A | B)$ is a non-negative number.

A2. One means certain: $p(A | B) = 1$ if $B \Rightarrow A$ i.e. if A is certain, given B.

A3. Or means add: if at most one of (countable) $A_i$ is true (i.e. disjoint), then $p(A_1 + A_2 + \ldots | B) = \sum_{i = 1} p(A_i | B)$.

Since the publication of Cox’s Theorem, an alternative approach to probability has gained popularity. Cox (again I’m following Jaynes) proposes the following desiderata of rationality, not as arbitrary axioms but as expectations of any rational approach to reasoning in the face of uncertainty.

D1. Probabilities are represented by real numbers.

D2. Probabilities change in common sense ways. For example, if learning C makes B more likely, but doesn’t change how likely A is, then learning C should make AB more likely.

D3. If a conclusion can be reasoned out in more than one way, then every possible way must lead to the same result.

D4. Information must not be arbitrarily ignored. All given evidence must be taken into account.

D5. Identical states of knowledge (except perhaps for the labeling of the propositions) should result in identical assigned probabilities.

The great advance of Cox’s theorem was to show that one can start with these desiderata and arrive at Kolmogorov’s axioms (at least, for the finite version of A3). Thus, the traditional laws of probability can be applied to more than just frequencies.

### Conjunction and Total Probability

Beginning with the desiderata, we can derive more than Kolmogorov’s axioms. There are a number of useful probability identities. Remember that an identity is a formula that holds for any propositions we may substitute in. Here are two preliminary identities, before we get to Bayes’ theorem.

Conjunction: If $A = A_1 A_2$ then, Continue Reading »

## What to Read: The Fine-Tuning of the Universe for Intelligent life

I’ve spent a lot of time critiquing articles on the fine-tuning of the universe for intelligent life. I should really give the other side of the story. Below are some of the good ones, ranging from popular level books to technical articles. I’ve given my recommendations for popular cosmology books here.

## Books – Popular-level

• Just Six Numbers, Martin Rees – Highly recommended, with a strong focus on cosmology and astrophysics, as you’d expect from the Astronomer Royal. Rees gives a clear exposition of modern cosmology, including inflation, and ends up giving a cogent defence of the multiverse.
• The Goldilocks Enigma, Paul Davies – Davies is an excellent writer and has long been an important contributor to this field. His discussion of the physics is very good, and includes a description of the Higgs mechanism. When he strays into metaphysics, he is thorough and thoughtful, even when he is defending conclusions that I don’t agree with.
• The Cosmic Landscape: String Theory and the Illusion of Intelligent Design, Leonard Susskind – I’ve reviewed this book in detail in a previous blog posts. Highly recommended. I can also recommend his many lectures on YouTube.
• Constants of Nature, John Barrow – A discussion of the physics behind the constants of nature. An excellent presentation of modern physics, cosmology and their relationship to mathematics, which includes a chapter on the anthropic principle and a discussion of the multiverse.
• Cosmology: The Science of the Universe, Edward Harrison – My favourite cosmology introduction. The entire book is worth reading, not least the sections on life in the universe and the multiverse.
• At Home in the Universe, John Wheeler – A thoughtful and wonderfully written collection of essays, some of which touch on matters anthropic.

I haven’t read Brian Greene’s book on the multiverse but I’ve read his other books and they’re excellent. Stephen Hawking discusses fine-tuning in A Brief History of Time and the Grand Design. As usual, read anything by Sean Carroll, Frank Wilczek, and Alex Vilenkin.

• The Cosmological Anthropic Principle, Barrow and Tipler – still the standard in the field. Even if you can’t follow the equations in the middle chapters, it’s still worth a read as the discussion is quite clear. Gets a bit speculative in the final chapters, but its fairly obvious where to apply your grain of salt.
• Universe or Multiverse (Edited by Bernard Carr) – the new standard. A great collection of papers by most of the experts in the field. Special mention goes to the papers by Weinberg, Wilczek, Aguirre, and Hogan.

## Scientific Review Articles

The field of fine-tuning grew out of the so-called “Large numbers hypothesis” of Paul Dirac, which is owes a lot to Weyl and is further discussed by Eddington, Gamow and others. These discussions evolve into fine-tuning when Dicke explains them using the anthropic principle. Dicke’s method is examined and expanded in these classic papers of the field: Continue Reading »

## Feser on Krauss

Having had my appetite for the Middle Ages whetted by Edward Grant’s excellent book A History of Natural Philosophy: From the Ancient World to the Nineteenth Century, I recently read Edward Feser’s Aquinas (A Beginner’s Guide). And, on the back of that, his book The Last Superstition. If I ever work out what a formal cause is, I might post a review.

In the meantime, I’ve quite enjoyed some of his blog posts about the philosophical claims of Lawrence Krauss. This is something I’ve blogged about a few times. His most recent post on Krauss contains this marvellous passage.

Krauss asserts:

“[N]othing is a physical concept because it’s the absence of something, and something is a physical concept.”

The trouble with this, of course, is that “something” is not a physical concept. “Something” is what Scholastic philosophers call a transcendental, a notion that applies to every kind of being whatsoever, whether physical or non-physical — to tables and chairs, rocks and trees, animals and people, substances and accidents, numbers, universals, and other abstract objects, souls, angels, and God. Of course, Krauss doesn’t believe in some of these things, but that’s not to the point. Whether or not numbers, universals, souls, angels or God actually exist, none of them would be physical if they existed. But each would still be a “something” if it existed. So the concept of “something” is broader than the concept “physical,” and would remain so even if it turned out that the only things that actually exist are physical.

No atheist philosopher would disagree with me about that much, because it’s really just an obvious conceptual point. But since Krauss and his fans have an extremely tenuous grasp of philosophy — or, indeed, of the obvious — I suppose it is worth adding that even if it were a matter of controversy whether “something” is a physical concept, Krauss’s “argument” here would simply have begged the question against one side of that controversy, rather than refuted it. For obviously, Krauss’s critics would not agree that “something is a physical concept.” Hence, confidently to assert this as a premise intended to convince someone who doesn’t already agree with him is just to commit a textbook fallacy of circular reasoning.

The wood floor guy analogy is pretty awesome, so be sure to have a read.