Another edition of “How to Use Bayes Theorem Properly 101” (links to previous posts are below). I was listening to a YouTube debate, and one of the speakers offered the following definition of “evidence”:
Evidence is a body of objectively verifiable facts, that are positively indicative of or exclusively concordant with one particular conclusion over any other.
They then demonstrated the many fatal flaws of this definition; for example, there is no such thing as objective verification of facts. Here, I’ll focus on another flaw.
Here’s a simplified version of a common scenario in science. We have 3 competing theories (A, B, C). For simplicity, assume that all other theories are ruled out (or, for homework, expand this example yourself to include more theories). On Monday, the available data implies that their (prior) probabilities are equal. By Friday, new data has arrived from two independent sources (say, separate telescopes), which we label X, Y. Being good Bayesians, we calculate the likelihood of the data on each theory: for example, if theory A were true, what is the probability that we would observe X?
Now, suppose that:
- Data X is moderately likely on A, very likely on B, and extremely unlikely on C.
- Data Y is moderately likely on A, very likely on C, and extremely unlikely on B.
Now, obviously, the most probable theory at the end of the week is A. The other theories have data for which they imply a very low likelihood, which effectively rules them out. In equations, we compare the posterior probability of theories A and C, for example:
where we have suppressed the background information in our notation, and recall that we have assumed that the priors p(A) and p(C) are equal, and X and Y are independent for the purposes of using the product rule.
Now, by hypothesis, the terms p(X|C) is extremely small, but none of the corresponding terms for the theory A are small, so the posterior probability of A is much greater than the posterior probability of C. It’s obvious, but it’s nice to see it in the maths.
Well, it should be obvious. But if you use the definition of “evidence” above, you reason as follows:
- X is not positively indicative of or exclusively concordant with A over other conclusions, because it is more likely on B. So, X is not evidence for A. So we throw it out.
- Y is not positively indicative of or exclusively concordant with A over other conclusions, because it is more likely on C. So, Y is not evidence for A. So we throw it out.
- Lo and behold, we have no evidence for theory A.
If you think that sounds like such an obvious error that you can’t imagine anyone making it relentlessly for over an hour, then see if you can sit through the aforementioned a YouTube debate.
The beauty of the Bayesian approach is that we don’t need to invent criteria ahead of time to decide what is and is not “evidence”. And so, you are much less likely to invent obviously flawed criteria and then stick to them relentlessly, while protesting that you’re just doing science and using the “every day” definition. Forget all that. Just put all your data in the posterior, do your maths right, and everything will work out. Irrelevant information will be ignored, cumulative cases will naturally grow or decay, vagueness and ad hocness will be punished, and so on. This is why so many scientists and philosophers are so enthusiastic about the Bayesian approach. I recommend it to you. After reading these posts, if you want more, Jaynes’s textbook is still the best place to start, in my opinion.
Leave a Reply