Feeds:
Posts

## Bayes’ Theorem: Ad Hoc-ness and Other Details

More about Bayes’ theorem; an introduction was given here. Once again, I’m not claiming any originality.

You can’t save a theory by stapling some data to it, even though this will improve its likelihood. Let’s consider an example.

Suppose, having walked into my kitchen, I know a few things.

$D_1$ = There is a cake in my kitchen.

$D_2$ = The cake has “Happy Birthday Luke!” on it, written in icing.

$B$ = My name is Luke + Today is my birthday + whatever else I knew before walking to the kitchen.

Obviously, $D_2 \Rightarrow D_1$ i.e. $D_2$ presupposes $D_1$. Now, consider two theories of how the cake got there.

$W$ = my Wife made me a birthday cake.

$A$ = a cake was Accidentally delivered to my house.

Consider the likelihood of these two theories. Using the product rule, we can write:

$p(D_1D_2 | WB) = p(D_2 | D_1 WB) p(D_1 | WB)$

$p(D_1D_2 | AB) = p(D_2 | D_1 AB) p(D_1 | AB)$

Both theories are equally able to place a cake in my kitchen, so $p(D_1 | WB) \approx p(D_1 | AB)$. However, a cake made by my wife on my birthday is likely to have “Happy Birthday Luke!” on it, while a cake chosen essentially at random could have anything or nothing at all written on it. Thus, $p(D_2 | D_1 WB) \gg p(D_2 | D_1 AB)$. This implies that $p(D_1D_2 | WB) \gg p(D_1D_2 | AB)$ and the probability of $W$ has increased relative to $A$ since learning $D_1$ and $D_2$.

So far, so good, and hopefully rather obvious. Let’s look at two ways to try to derail the Bayesian account.

### Details Details

Before some ad hoc-ery, consider the following objection. We know more than $D_1$ and $D_2$, one might say. We also know,

$D_3$ = there is a swirly border of piped icing on the cake, with a precisely measured pattern and width.

Now, there is no reason to expect my wife to make me a cake with that exact pattern, so our likelihood takes a hit:

$p(D_3 | D_1 D_2 WB) \ll 1 ~ \Rightarrow ~ p(D_1 D_2 D_3 | WB) \ll p(D_1D_2 | WB)$

Alas! Does the theory that my wife made the cake become less and less likely, the closer I look at the cake? No, because there is no reason for an accidentally delivered cake to have that pattern, either. Thus,

$p(D_3 | D_1 D_2 WB) \approx p(D_3 | D_1 D_2 AB)$

And so it remains true that,

$p(D_1 D_2 D_3 | WB) \gg p(D_1 D_2 D_3 | AB)$

and the wife hypothesis remains the prefered theory. This is point 5 from my “10 nice things about Bayes’ Theorem” – ambiguous information doesn’t change anything. Additional information that lowers the likelihood of a theory doesn’t necessarily make the theory less likely to be true. It depends on its effect on the rival theories.

What if we crafted another hypothesis, one that could better handle the data? Consider this theory.

$A_D$ = a cake with “Happy Birthday Luke!” on it was accidentally delivered to my house.

Unlike $A$, $A_D$ can explain both $D_1$ and $D_2$. Thus, the likelihoods of $A_D$ and $W$ are about equal: $p(D_1D_2 | WB) \approx p(D_1D_2 | A_DB)$. Does the fact that I can modify my theory to give it a near perfect likelihood sabotage the Bayesian approach?

Intuitively, we would think that however unlikely it is that a cake would be accidentally delivered to my house, it is much less likely that it would be delivered to my house and have “Happy Birthday Luke!” on it. We can show this more formally, since $A_D$ is a conjunction of propositions $A_D = A A'$, where

$A'$ = The cake has “Happy Birthday Luke!” on it, written in icing.

But the statement $A'$ is simply the statement $D_2$. Thus $A_D = A D_2$. Recall that, for Bayes’ Theorem, what matters is the product of the likelihood and the prior. Thus,

$p(D_1 D_2 | A_D B) ~ p(A_D | B)$

$= p(D_1 D_2 | A D_2 B) ~ p(A D_2 | B)$

$= p(D_1|A D_2B) ~ p(D_2|AB) ~ p(A|B)$

$= p(D_1 D_2 | A B) ~ p(A | B)$

Thus, the product of the likelihood and the prior the same for the ad hoc theory $A_D$ and the original theory $A$. You can’t win the Bayesian game by stapling the data to your theory. Ad hoc theories, by purchasing a better likelihood at the expense of a worse prior, get you nowhere in Bayes’ theorem. It’s the postulates that matter. Bayes’ Theorem is not distracted by data smuggled into the hypothesis.

### Too strong?

While all this is nice, it does assume rather strong conditions. It requires that the theory in question explicitly includes the evidence. If we look closely at the statements that make up $T$, we will find $D$ amongst them, i.e. we can write the theory as $T = T' D$. A theory can be jerry-rigged without being this obvious. I’ll have a closer look at this in a later post.