Continuing with the probability theory, a quick myth-busting. I touched on this last time, but it comes up often enough to deserve its own post. Recall that rationality requires us to calculate the probability of our theory of interest T given everything we know K. We saw that it is almost always useful to split up our knowledge into data D and background B. These are just labels. In practice, the important thing is that I can calculate the probabilities of D with B and T, so that I can calculate the terms in Bayes’ theorem,
Something to note: in this calculation, we assume that we know that D is true, and yet we are calculating the probability of D. For example, the likelihood . The probability is not necessarily one. So do we know D or don’t we?!
The probability is not simply “what do you reckon about D?”. Jaynes considers the construction of a reasoning robot. You feed information in one slot and, upon request, out comes the probability of any statement you care to ask it about. These probabilities are objective in the sense that any two correctly constructed robots should give the same answer, as should any perfectly rational agent. Probabilities are subjective in the sense that they are relative to what information is fed in. There are no “raw” probabilities
. So the probability
asks: what probability would the robot assign to D if we fed in only T and B?
Thus, probabilities are conditionals, and in particular the likelihood represents a counterfactual conditional: if all I knew were the background information B and the theory T, what would the probability of D be? These are exactly the questions that every maths textbook sets as exercises: given 10 tosses of a fair coin, what is the probability of exactly 8 heads? We can still ask these questions even after we’ve actually seen 8 heads in 10 coin tosses. It is not the case that the probability of some event is one once we’ve observed that event.
What is true is that, if I’ve observed D, then the probability of D given everything I’ve observed is one. If you feed D into the reasoning robot, and then ask it for the probability of D, it will tell you that it is certain that D is true. Mathematically, p(D|D) = 1.
I agree with your mathematics but disagree with your interpretations.
If D is known, then it should be conditioned on. So the probability becomes 1. Incidentally, this is how you unify Bayesian updating with MaxEnt updating, see my blog post here: https://letterstonature.wordpress.com/2008/12/29/where-do-i-stand-on-maximum-entropy/.
You can still talk about P(D|TB), which is usually not 1. However, since D is not on the right hand side, this is a prior probability (with respect to D).
I think that’s my point. We can still talk about prior probabilities with respect to things we know. We can use prior probabilities to calculate posterior probabilities. We should always condition posterior probabilities on everything we know.
I think we agree, and I am just overreacting to your title, because the statement is true in at least one sense that I think is an important one.
I should perhaps change the title to “we’ve observed X, so the probability of X is *necessarily* one.” I’m trying to counter the following mistake, known elsewhere as the “problem” of old evidence (e.g. http://plato.stanford.edu/entries/epistemology-bayesian/#ObjSimPriConRulInfOthObjBayConThe) …
Alice: The probability of the perihelion shift of Mercury on Einstein’s theory of gravity is much greater than the probability of the perihelion shift of Mercury on Newton’s theory of gravity. All things being even, we should prefer Einstein’s theory over Newton’s.
Bob: But we knew about the perihelion shift of Mercury in 1859, 50 years before Einstein. The probability of something we know is one, so the probability of the perihelion shift is one, regardless of some theory. So it gives us no reason to prefer Newton to Einstein.
That’s one of the most absurd non-problems I’ve come across!
My thoughts exactly.
[…] Something to note from the discussion above. While L is not given in the likelihoods, it is given in the posterior, as the sequence of equals signs show. Thus, the fact that L does not appear in certain terms in the equation does not mean that we are ignoring L, or reasoning as if we didn’t know L, or pretending that L doesn’t count. Put another way – just because something is known, doesn’t mean that it is taken as given in every term in our calculation of the posterior. If that confuses you, read this. […]
In the Ikeda & Jefferys piece, they also talk about this problem of old evidence, (section 2.1.1.) They give an example and they rightly conclude that “we can learn from old data, and the inclusion of old evidence E1 does support T1 by showing (in this case) that P(T1|E1)>P(T1).”
So apparently, even if we for example already knew about the perihelion shift of Mercury before we knew about the theory and its prediction, we can still say that it supports a certain hypothesis or theory over another and that therefore, we should prefer that certain hypothesis / theory over another.
So am I right in thinking that with respect to the fine tuning argument, the evidence is actually L (life) and F (with F not merely defined as “life friendly”, but as “fine tuning was necessary for life”) together? even though we already knew that L was the case?
That would refute perhaps also one of Richard Carriers objections to the fine tuning argument, namely that the fine tuning argument “…would only be true for people who aren’t observers (since b then contains no observers), and since the probability of there being people who aren’t observers is zero, his calculation would be irrelevant (it would be true only for people who don’t exist, i.e., any conclusion that is conditional on “there are no observers” is of no interest to observers).” (in his footnote 23)
His objection is like saying: “the argument that leads to the conclusion that the perihelion shift of Mercury supports the Einstein’s theory of gravity would only be true for people who haven’t known about the perihelion shift of Mercury in advance. But for people who have known about the periphelion shift of Mercury in advance, the calculation is irrelevant.” Which seems wrong.
Any thoughts?
[…] Probability Myth: we’ve observed X, so the probability of X is one […]