Feeds:
Posts

## When a Fact becomes Evidence

Another edition of “How to Use Bayes Theorem Properly 101” (links to previous posts are below). I was listening to a YouTube debate, and one of the speakers offered the following definition of “evidence”:

Evidence is a body of objectively verifiable facts, that are positively indicative of or exclusively concordant with one particular conclusion over any other.

They then demonstrated the many fatal flaws of this definition; for example, there is no such thing as objective verification of facts. Here, I’ll focus on another flaw. (more…)

## Carroll’s five replies to the fine-tuning argument: 1a. Aside on Naturalism

Before I get onto Carroll’s other replies to the fine-tuning argument, I need to discuss a feature of naturalism that will be relevant to what follows.

I take naturalism to be the claim that physical stuff is the only stuff. That is, the only things that exist concretely are physical things. (I say “concretely” in order to avoid the question of whether abstract things like numbers exist. Frankly, I don’t know.)

On naturalism, the ultimate laws of nature are the ultimate brute facts of reality. I’ve discussed this previously (here and here): the study of physics at any particular time can be summarised by three statements:

1. A list of the fundamental constituents of physical reality and their properties.
2. A set of mathematical equations describing how these entities change, interact and rearrange.
3. A statement about how the universe began (or some other boundary condition, if the universe has no beginning point).

In short, what is there, what does it do, and in what state did it start?

Naturalism is the claim that there is some set of statements of this kind which forms the ultimate brute fact foundation of all concrete reality. There is some scientific theory of the physical contents of the universe, and once we’ve discovered that, we’re done. All deeper questions – such as where that stuff came from, why it is that type of stuff, why it obeys laws, why those laws, or why there is anything at all – are not answerable in terms of the ultimate laws of nature, and so are simply unanswerable. They are not just in need of more research; there are literally no true facts which shed any light whatsoever on these questions. There is no logical contradiction in asserting that the universe could have obeyed a different set of laws, but nevertheless there is no reason why our laws are the ones attached to reality and the others remain mere possibilities.

(Note: if there is a multiverse, then the laws that govern our cosmic neighbourhood are not the ultimate laws of nature. The ultimate laws would govern the multiverse, too.)

### Non-informative Probabilities

In probability theory, we’ve seen hypotheses like naturalism before. They are known as “non-informative”.

In Bayesian probability theory, probabilities quantify facts about certain states of knowledge. The quantity p(A|B) represents the plausibility of the statement A, given only the information in the state of knowledge B. Probability aims to be an extension of deductive logic, such that:

“if A then B”, A -> B, and p(B|A) = 1

are the same statement. Similarly,

“if A then not B”, A -> ~B and p(B|A) = 0

are the same statement.

Between these extremes of logical implication, probability provides degrees of plausibility.

It is sometimes the case that the proposition of interest A is very well informed by B. For example, what is the probability that it will rain in the next 10 minutes, given that I am outside and can see blue skies in all directions? On other occasions, we are ignorant of some relevant information. For example, what is the probability that it will rain in the next 10 minutes, given that I’ve just woken up and can’t open the shutters in this room? Because probability describes states of knowledge, it is not necessarily derailed by a lack of information. Ignorance is just another state of knowledge, to be quantified by probabilities.

In Chapter 9 of his textbook “Probability Theory” (highly recommended), Edwin Jaynes considers a reasoning robot that is “poorly informed” about the experiment that it has been asked to analyse. The robot has been informed only that there are N possibilities for the outcome of the experiment. The poorly informed robot, with no other information to go on, should assign an equal probability to each outcome, as any other assignment would show unjustified favouritism to an arbitrarily labeled outcome. (See Jaynes Chapter 2 for a discussion of the principle of indifference.)

When no information is given about any particular outcome, all that is left is to quantify some measure of the size of the set of possible outcomes. This is not to assume some randomising selection mechanism. This is not a frequency, nor the objective chance associated with some experiment. It is simply a mathematical translation of the statement: “I don’t know which of these N outcomes will occur”. We are simply reporting our ignorance.

At the same time, the poorly informed robot can say more than just “I don’t know”, since it does know the number of possible outcomes. A poorly informed robot faced with 7 possibilities is in a different state of knowledge to one faced with 10,000 possibilities.

A particularly thorny case is characterising ignorance over a continuous parameter, since then there are an infinite number of possibilities. When a probability distribution for a certain parameter is not informed by data but only “prior” information, it is called a “non-informative prior”. Researchers continue the search for appropriate non-informative priors for various situations; the interested reader is referred to the “Catalogue of Non-informative Priors”. (more…)

## Probably Not – A Fine-Tuned Critique of Richard Carrier (Part 1)

After a brief back and forth in a comments section, I was encouraged by Dr Carrier to read his essay “Neither Life nor the Universe Appear Intelligently Designed”. I am assured that the title of this essay will be proven “with such logical certainty” that all opposing views should be wiped off the face of Earth.

Dr Richard Carrier is a “world-renowned author and speaker”. That quote comes from none other than the world-renowned author and speaker, Dr Richard Carrier. Fellow atheist Massimo Pigliucci says,

The guy writes too much, is too long winded, far too obnoxious for me to be able to withstand reading him for more than a few minutes at a time.

I know the feeling. When Carrier’s essay comes to address evolution, he recommends that we “consider only actual scholars with PhD’s in some relevant field”. One wonders why, when we come to consider the particular intersection of physics, cosmology and philosophy wherein we find fine-tuning, we should consider the musings of someone with a PhD in ancient history. (A couple of articles on philosophy does not a philosopher make). Especially when Carrier has stated that there are six fundamental constants of nature, but can’t say what they are, can’t cite any physicist who believes that laughable claim, and refers to the constants of the standard model of particle physics (which every physicist counts as fundamental constants of nature) as “trivia”.

In this post, we will consider Carrier’s account of probability theory. In the next post, we will consider Carrier’s discussion of fine-tuning. The mathematical background and notation of probability theory were given in a previous post, and follow the discussion of Jaynes. (Note: probabilities can be either $p$ or $P$, and both an overbar $\bar{A}$ and tilde $\sim A$ denote negation.)

### Probability theory, a la Carrier

I’ll quote Carrier at length.

Bayes’ theorem is an argument in formal logic that derives the probability that a claim is true from certain other probabilities about that theory and the evidence. It’s been formally proven, so no one who accepts its premises can rationally deny its conclusion. It has four premises … [namely P(h|b), P(~h|b), P(e|h.b), P(e|~h.b)]. … Once we have [those numbers], the conclusion necessarily follows according to a fixed formula. That conclusion is then by definition the probability that our claim h is true given all our evidence e and our background knowledge b.

We’re off to a dubious start. Bayes’ theorem, as the name suggests, is a theorem, not an argument, and certainly not a definition. Also, Carrier seems to be saying that P(h|b), P(~h|b), P(e|h.b), and P(e|~h.b) are the premises from which one formally proves Bayes’ theorem. This fails to understand the difference between the derivation of a theorem and the terms in an equation. Bayes’ theorem is derived from the axioms of probability theory – Kolmogorov’s axioms or Cox’s theorem are popular starting points. Any necessity in Bayes’ theorem comes from those axioms, not from the four numbers P(h|b), P(~h|b), P(e|h.b), and P(e|~h.b). (more…)

## Bayes’ Theorem: Ad Hoc-ness and Other Details

More about Bayes’ theorem; an introduction was given here. Once again, I’m not claiming any originality.

You can’t save a theory by stapling some data to it, even though this will improve its likelihood. Let’s consider an example.

Suppose, having walked into my kitchen, I know a few things. $D_1$ = There is a cake in my kitchen. $D_2$ = The cake has “Happy Birthday Luke!” on it, written in icing. $B$ = My name is Luke + Today is my birthday + whatever else I knew before walking to the kitchen.

Obviously, $D_2 \Rightarrow D_1$ i.e. $D_2$ presupposes $D_1$. Now, consider two theories of how the cake got there. $W$ = my Wife made me a birthday cake. $A$ = a cake was Accidentally delivered to my house.

Consider the likelihood of these two theories. Using the product rule, we can write: $p(D_1D_2 | WB) = p(D_2 | D_1 WB) p(D_1 | WB)$ $p(D_1D_2 | AB) = p(D_2 | D_1 AB) p(D_1 | AB)$

Both theories are equally able to place a cake in my kitchen, so $p(D_1 | WB) \approx p(D_1 | AB)$. However, a cake made by my wife on my birthday is likely to have “Happy Birthday Luke!” on it, while a cake chosen essentially at random could have anything or nothing at all written on it. Thus, $p(D_2 | D_1 WB) \gg p(D_2 | D_1 AB)$. This implies that $p(D_1D_2 | WB) \gg p(D_1D_2 | AB)$ and the probability of $W$ has increased relative to $A$ since learning $D_1$ and $D_2$.

So far, so good, and hopefully rather obvious. Let’s look at two ways to try to derail the Bayesian account.

### Details Details

Before some ad hoc-ery, consider the following objection. We know more than $D_1$ and $D_2$, one might say. We also know, $D_3$ = there is a swirly border of piped icing on the cake, with a precisely measured pattern and width.

Now, there is no reason to expect my wife to make me a cake with that exact pattern, so our likelihood takes a hit: $p(D_3 | D_1 D_2 WB) \ll 1 ~ \Rightarrow ~ p(D_1 D_2 D_3 | WB) \ll p(D_1D_2 | WB)$

Alas! Does the theory that my wife made the cake become less and less likely, the closer I look at the cake? No, because there is no reason for an accidentally delivered cake to have that pattern, either. Thus, $p(D_3 | D_1 D_2 WB) \approx p(D_3 | D_1 D_2 AB)$

And so it remains true that, $p(D_1 D_2 D_3 | WB) \gg p(D_1 D_2 D_3 | AB)$

and the wife hypothesis remains the prefered theory. This is point 5 from my “10 nice things about Bayes’ Theorem” – ambiguous information doesn’t change anything. Additional information that lowers the likelihood of a theory doesn’t necessarily make the theory less likely to be true. It depends on its effect on the rival theories.

What if we crafted another hypothesis, one that could better handle the data? Consider this theory. $A_D$ = a cake with “Happy Birthday Luke!” on it was accidentally delivered to my house.

Unlike $A$, $A_D$ can explain both $D_1$ and $D_2$. Thus, the likelihoods of $A_D$ and $W$ are about equal: $p(D_1D_2 | WB) \approx p(D_1D_2 | A_DB)$. Does the fact that I can modify my theory to give it a near perfect likelihood sabotage the Bayesian approach?

Intuitively, we would think that however unlikely it is that a cake would be accidentally delivered to my house, it is much less likely that it would be delivered to my house and have “Happy Birthday Luke!” on it. We can show this more formally, since $A_D$ is a conjunction of propositions $A_D = A A'$, where $A'$ = The cake has “Happy Birthday Luke!” on it, written in icing.

But the statement $A'$ is simply the statement $D_2$. Thus $A_D = A D_2$. Recall that, for Bayes’ Theorem, what matters is the product of the likelihood and the prior. Thus, $p(D_1 D_2 | A_D B) ~ p(A_D | B)$ $= p(D_1 D_2 | A D_2 B) ~ p(A D_2 | B)$ $= p(D_1|A D_2B) ~ p(D_2|AB) ~ p(A|B)$ $= p(D_1 D_2 | A B) ~ p(A | B)$

Thus, the product of the likelihood and the prior the same for the ad hoc theory $A_D$ and the original theory $A$. You can’t win the Bayesian game by stapling the data to your theory. Ad hoc theories, by purchasing a better likelihood at the expense of a worse prior, get you nowhere in Bayes’ theorem. It’s the postulates that matter. Bayes’ Theorem is not distracted by data smuggled into the hypothesis.

### Too strong?

While all this is nice, it does assume rather strong conditions. It requires that the theory in question explicitly includes the evidence. If we look closely at the statements that make up $T$, we will find $D$ amongst them, i.e. we can write the theory as $T = T' D$. A theory can be jerry-rigged without being this obvious. I’ll have a closer look at this in a later post.

## Bayes Theorem: Certainty Starts in Here

Continuing on my series on Bayes’ Theorem, recall that the question of any rational investigation is this: what is the probability of the theory of interest T, given everything that I know K? Thanks to Bayes’ theorem, we can take this probability $p(T | K)$ and break it into manageable pieces. In particular, we can divide K into background information B and data D. Remember that this is just convenience, and in particular that B and D are both assumed to be known.

Suppose one calculates $p(T | DB)$ for some theory, data and background information. Think of it as a practice problem in a textbook. This calculation, in and of itself, knows nothing of the real world. So what follows? We can think of the probability as a conditional if-then statement:

1. If DB, then the probability of T is $p(T | DB)$.

To draw a conclusion from this, we must add the premise.

2. DB.

Only then can we conclude,

3. The probability of T is $p(T | DB)$.

But wait a minute … the whole point of this exercise was to reason in the face of uncertainty. Where do we get the nerve to simply assert 2, that DB is true? Where is the inevitable uncertainty of measurement? Isn’t treating the data as certain hopelessly idealized? Shouldn’t we take into account how probable DB is? But there are no raw probabilities, so with respect to what should we calculate the probability of DB? We’re headed for an infinite regress if we keep asking for probabilities. How do we get premise 2? Are probabilities all merely hypothetical?

## Probability Myth: we’ve observed X, so the probability of X is one

Continuing with the probability theory, a quick myth-busting. I touched on this last time, but it comes up often enough to deserve its own post. Recall that rationality requires us to calculate the probability of our theory of interest T given everything we know K. We saw that it is almost always useful to split up our knowledge into data D and background B. These are just labels. In practice, the important thing is that I can calculate the probabilities of D with B and T, so that I can calculate the terms in Bayes’ theorem, $p(T | DB) = \frac{p(D | TB) p(T | B)} {p(D | B)}$

Something to note: in this calculation, we assume that we know that D is true, and yet we are calculating the probability of D. For example, the likelihood $p(D | TB)$. The probability is not necessarily one. So do we know D or don’t we?!

The probability $p(D | TB)$ is not simply “what do you reckon about D?”. Jaynes considers the construction of a reasoning robot. You feed information in one slot and, upon request, out comes the probability of any statement you care to ask it about. These probabilities are objective in the sense that any two correctly constructed robots should give the same answer, as should any perfectly rational agent. Probabilities are subjective in the sense that they are relative to what information is fed in. There are no “raw” probabilities $p(D)$. So the probability $p(D | TB)$ asks: what probability would the robot assign to D if we fed in only T and B?

Thus, probabilities are conditionals, and in particular the likelihood represents a counterfactual conditional: if all I knew were the background information B and the theory T, what would the probability of D be? These are exactly the questions that every maths textbook sets as exercises: given 10 tosses of a fair coin, what is the probability of exactly 8 heads? We can still ask these questions even after we’ve actually seen 8 heads in 10 coin tosses. It is not the case that the probability of some event is one once we’ve observed that event.

What is true is that, if I’ve observed D, then the probability of D given everything I’ve observed is one. If you feed D into the reasoning robot, and then ask it for the probability of D, it will tell you that it is certain that D is true. Mathematically, p(D|D) = 1.

## Bayes’ Theorem: what is this “background information”?

I laid down the basics of probability theory and Bayes’ theorem in a previous post. Here’s the story, as a reminder. We have some background information B and some data D. We want to know: what is the probability that some theory T is true, given the background information and the data? We write this as $p(T | DB)$, and expand it using Bayes’ theorem: $p(T | DB) = \frac{p(D | TB) p(T | B)} {p(D | B)}$

However, this “background information” is a little vague. What puts something in the background? How much background do I need to dig up? How do we divide our knowledge into background information B and data D? Does it matter? Is it that background information is being assumed as known, while for the data, as with all measurements, we must acknowledge a degree of uncertainty?

Here’s a few things about background information.

1. Tell me everything

2. The posterior views data and background equally

3. Calculate probabilities of data, with background information

4. Both background and data are taken as given

5.  You should divide K cleanly

### 1. Tell me everything

The question is this: given everything I know $K$, what is the probability that some statement $T$ is true? The idea is that a rational thinker, in evaluating some statement T, will take into account everything they know. Remember that one of the desiderata of probability theory, taken as a rational approach to reasoning with uncertainty, is that information must not be arbitrarily ignored. In principle, everything we know should be in $K$ somewhere. So tell me everything.

In practice, thankfully, irrelevant information can be ignored as it will factor out anyway (Point 5). That gives us the definition of “relevant” in probability theory: a statement is relevant if including it as given changes our probabilities.

### 2. The posterior views data and background equally

Why, then, have we decided to break up everything we know $K$ into “data” and “background” $DB$? (Remember: DB means “both D and B are true”). The reason is that probabilities don’t grow on trees. If we had a black box that handed out posteriors $p(T | K)$ for any information K and theory T we care to think of, then we wouldn’t need to worry about Bayes theorem or background and data. Remember: the whole point of probability identities is to take the probability we want and write it in terms of probabilities we have. (more…)

## 10 Nice things about Bayes’ theorem

As background for some future posts, I need to catalogue a few facts about Bayes’ theorem. This is all standard probability theory. I’ll be roughly following the discussion and notation of Jaynes.

### Probability

We start with propositions, represented by A, B, etc. These propositions are in fact true or false, though we may not know which. We assume that we can do Boolean things with these propositions, in particular:

• Conjunction: $AB$ means “both A and B are true”

• Disjunction: $A+B$ means “at least one of the propositions A, B is true”

• Negation: $\bar{A}$ means “not A” (i.e. A is false)

We want to assign probabilities to propositions to represent how likely they are to be true, given the information in other propositions. The function $p(A | B)$ assigns a probability to the proposition $A$, using only the information in $B$. Read “p” as “the probability of” and the vertical bar “ | ” as “given”, so $p(A | B)$ reads “the probability of A given B”. There are no “raw” probabilities, $p(A)$.

What is p? The older approach to probability of Kolmogorov et al. requires that P satisfy certain axioms. Roughly,

A1. $p(A | B)$ is a non-negative number.

A2. One means certain: $p(A | B) = 1$ if $B \Rightarrow A$ i.e. if A is certain, given B.

A3. Or means add: if at most one of (countable) $A_i$ is true (i.e. disjoint), then $p(A_1 + A_2 + \ldots | B) = \sum_{i = 1} p(A_i | B)$.

Since the publication of Cox’s Theorem, an alternative approach to probability has gained popularity. Cox (again I’m following Jaynes) proposes the following desiderata of rationality, not as arbitrary axioms but as expectations of any rational approach to reasoning in the face of uncertainty.

D1. Probabilities are represented by real numbers.

D2. Probabilities change in common sense ways. For example, if learning C makes B more likely, but doesn’t change how likely A is, then learning C should make AB more likely.

D3. If a conclusion can be reasoned out in more than one way, then every possible way must lead to the same result.

D4. Information must not be arbitrarily ignored. All given evidence must be taken into account.

D5. Identical states of knowledge (except perhaps for the labeling of the propositions) should result in identical assigned probabilities.

The great advance of Cox’s theorem was to show that one can start with these desiderata and arrive at Kolmogorov’s axioms (at least, for the finite version of A3). Thus, the traditional laws of probability can be applied to more than just frequencies.

### Conjunction and Total Probability

Beginning with the desiderata, we can derive more than Kolmogorov’s axioms. There are a number of useful probability identities. Remember that an identity is a formula that holds for any propositions we may substitute in. Here are two preliminary identities, before we get to Bayes’ theorem.

Conjunction: If $A = A_1 A_2$ then, (more…)

## How to win at the races

I’ve rambled about this before, but with the Melbourne Cup – “the race that stops a nation” – a few days away and Tom Waterhouse’s annoying face on TV too often, it’s worth repeating.

Don’t bet on the horse you think will win!

More precisely, don’t necessarily bet on the horse you think will win. Here is the only betting system that works:

1. For each horse in the race, and before you look at the price offered by the bookmaker, write what you think the probability (as a percentage) is that the horse will win. I.e. if the race was run 100 times, how many times would this horse win? You’ll have to do your homework on the field.
2. For each horse, take your probability and multiply it by the bookmakers price. Call that the magic number.
3. If any of the horses have a magic number greater than 100, bet on the horse with the highest magic number.
4. If none of the horses have a magic number greater than 100, don’t bet. Go home.

The magic number is how much (on average) you would make if you bet $1 on the horse 100 times, so it better be more than 100. The way that the bookmaker guarantees that they will make a profit in the long run is to ensure that no magic numbers are greater than 100. Because of the bookmakers slice (the overround), the odds are stacked against the average punter. You will only end up with a magic number greater than 100 if either you have made a mistake on step 1, or the bookmaker has made a mistake on his price. This leads to the following advice. You should only bet on a horse if a) You know more than the bookmaker, and b) The bookmaker has significantly underestimated one of the horses. Thus, the better the bookmaker, the more reason not to bet. And so, we come to Tom Waterhouse’s online betting business: “I’ve got four generations of betting knowledge in my blood. … Bet with me, and that knowledge can be yours.” This is exactly the information you need to conclude that you should never bet with Tom Waterhouse. The ad might as well say “bet with me; I know how to take your money”. You don’t want a bookmaker who knows horse racing inside-and-out, from horse racing stock, armed will all the facts, knowing all the right people. You don’t want a professional in a sharp suit surrounded by a analysts at computer screens. You want an idiot. You want someone who doesn’t know which end of the horse is the front, armed with a broken abacus and basing his prices on a combination of tea-leaf-reading, a lucky 8-ball and “the vibe“. You want a bookmaker that is going out of business. The more successful the bookmaker, the further you should stay away. The TAB was established in 1964, has over a million customers, 2,500 retail outlets, and made a profit of$534.8 million in 2011, up 14%. Translation: never bet with the TAB. Betfair’s profits were $600 million, SportingBet made$2 billion in 2009.  With those resources, they’ll always know more than you. If you’ve heard of them, don’t bet with them. Go home.

Hopefully you’re getting my point. Don’t bet on sports. If you go to the races, put on a nice outfit, drink a few beers and give the money to charity. If you must bet, have a random sweepstakes with your friends. You’ll get much better odds that way.

## The Bayesian Utility of Derren Brown

I had the great pleasure a few nights ago to see Derren Brown‘s new illusionist / mentalist, hypnotist show Svengali. It’s fantastic, and highly recommended. If you’ve seen any of Derren’s previous shows on TV, then some of the routines will be familiar. This fails to make them any less baffling. If you’re unfamiliar with his work, here’s a sample:

(Here’s a bit more). One of the main themes of much of Brown’s work is his ability to recreate the “powers” of psychics, mind-readers and spiritualists without the pretence of supernatural intervention or paranormal activity. For example, in 2004 he performed a seance “live” on channel 4, and in 2011 trained a member of the British public to become a faith healer.

There is an important and quite general lesson to be learned from Brown’s abilities. In the course of last night’s performance, Brown did a number of things which, if they had been performed by someone claiming psychic powers, would seem, if not totally convincing, at least on the way to suggesting psychic powers. I remain at a complete loss as to how Brown seems to read the minds of audience members and anticipate their seemingly free choices.

Suppose that Connie claims to be a witch – a real, proper, supernatural witch – and as proof of her powers, performs a great feat of mind-reading. Being the mathematical nerds that we are, we decide to formalise our inference that Connie is a witch (and should thus be burned). Help us, Rev. Bayes!

Let: (more…)