Feeds:
Posts

Appealing to Authority: A User’s Guide

(In which I make fairly uncontroversial points about evidence using controversial examples, thus providing my own red herring.)

There is a logical fallacy known as appealing to authority, which goes like this.

1. X believes Y

2. X is very clever / a well known expert / a professor / a reliable person …

3. Thus, Y is true

This is a fallacy because people can be wrong, even smart people.

However, we all believe things because someone told us so. We would make impossibly slow progress in life if we had to verify every belief for ourselves. There are people we usually trust – not blindly but because they’ve proven trustworthy.

Modern science has a rather strained relationship with authority. On the one hand, much of science’s early progress came from following the maxim: “don’t believe everything Aristotle said”. If you want to know what the natural world is like, go ask the natural world. Go into the lab, go get a telescope, go find out for yourself. Experiments shouldn’t just be repeatable. They should be repeated.

On the other hand, science has made so much progress, collected so much data, published so many papers that no one person can have checked it all out for themselves. The astronomy preprint archive posts about 100 new astronomy papers per day. We are reliant on the honesty and competence of other scientists who provide us with measurements, constants, equations, simulations. We can check some of it but not all.

The layman is in an even worse position, as they may not have the training, time and skills to verify the conclusions of science even if they wanted to. Should Jo(e) Public believe that the universe is expanding because scientists say so? Or should they build their own telescope, invent the spectrometer, locate and observe distant galaxies, measure atomic emission spectra, calculate the recession velocities of the galaxies, observe Cepheid variable stars, discover their period-magnitude relation, use this relation to measure the distance to galaxies, note the linear dependence of velocity on distance, discover general relativity, solve for an expanding spacetime, derive the linear dependence of velocity on distance, and then (and only then) conclude that the universe is expanding?

The truth, of course, is somewhere in the middle: Jo(e) public can understand how scientists have reached the conclusion that the universe is expanding, so that this central pillar of the Big Bang theory is not pure assertion. They can follow our path on a map, as it were, even if they do not tread every step again themselves. Science ultimately appeals to observations of the natural world, even if they are usually someone else’s observations.

So when should we believe an authority? When does quoting an authority commit the aforementioned fallacy? The key is to be specific about what is being claimed.

Authority as evidence

We mustn’t confuse knowledge with certainty. We need to reason using probabilities. Bayes theorem teaches us how to update our probabilities when new, relevant information comes to light. Learning that a particular expert believes something is new, relevant information. It may not be decisive, but shouldn’t be ignored.

1′. X believes Y

2′. X is in a position to assess the evidence for and against Y. X is informed, competent, experienced, has a reputation for honesty, has no conflict of interest etc.

3′. Thus, the probability of Y being true has increased.

A belief is justified or warranted if it was formed using reliable methods, which can be counted on to produce true beliefs most of the time. Recognising those methods in another person adds weight to their opinion.

Authority as establishing the burden of proof

I really shouldn’t choose such a hot-button topic to make a point like this, but it’s such a good example that I can’t resist. Take climate change. The majority of the climate science community has concluded that the evidence supports the hypothesis that human activity has and will lead to substantial, detrimental changes to our planet’s climate.

Does that prove that climate change is real? No. Proving is something that mathematicians do. It does, however, set the standard for those who believe that climate change is not real. The scientific consensus is prima facie evidence of the truth of climate change. Jo(e) Public is justified, in the absence of the time and skills to investigate for themselves, in believing that climate change is more likely to be true than false. Those who wish to believe that climate change is probably not real have the burden of showing that the scientists are wrong.

This is an application of my last point – authority as evidence. The consensus of an expert, informed community tips the scales in favour of climate change. The pronouncements of scientists are not infallible, but should not be rejected without good scientific reasons. Political conservativism, conspiracy theories and a desire to be viewed as an “iconoclast” are not good reasons.

Authority and Relevance

A key idea in the previous section was relevance. When an climate scientist reaches a conclusion about the state of the Earth’s climate, he is commenting on exactly the thing that s/he studies. There is a danger of experts being viewed as generically clever and thus authorities in any field they care to address. As usual, xkcd summarises the point beautifully.

Physicists

Another controversial example. I once saw someone on TV (possibly a news vox pop), when asked about life after death, cite as conclusive evidence the fact that Stephen Hawking doesn’t believe that there is life after death. Now, I have every respect for the prodigious talents of Professor Hawking, a scientist above whom the superlative “greatest” justifiably hovers. But none of the things that Hawking has done to gain his reputation have anything to do with life after death. He is an expert on quantum gravity, black holes, general relativity, and the cosmology of the very early universe.

Is there any evidence for life after death? Near death experiences, religious revelation, philosophical (metaphysical) arguments for the immateriality of the soul, and widespread belief in life after death around the world and throughout history are the factors usually cited. So the relevant areas of expertise are medicine, especially neuroscience, as well as a familiarity with the claims of witnesses, the psychology of human beings (e.g. fear of death), philosophy, comparative religion, etc. Life after death is usually taken to be incompatible with philosophical materialism, so philosophical arguments for materialism are also relevant. Prof. Hawking is not an expert in any of these areas. There are, I assume, plenty of doctors, neuroscientists, psychologists and philosophers with the relevant expertise who discount near death experiences and the afterlife. If you want to cite an authority, cite them. Hawking’s opinion in this area is worthy of consideration, of course, but not authoritative.

Beware of 8 out of 10 Authorities

The previous point applies a fortiori to surveys of experts. For example, the fact that a larger-than-average percentage of scientists do not believe in God would seem to explain itself. But there is a possible selection effect. It’s the same selection effect that one may suspect is behind the fact that, while (roughly) 80% of philosophers are atheists, this drops to just 20% of those philosophers who specialise in philosophy of religion.

It’s the same old correlation vs. causation story. Did a random sample enter both fields, and thereafter have their views moulded by their respective subject matter? Or did a prior belief in God lead some to be philosophers of religion, while a lack of belief led others to become scientists? A survey of 1,646 scientists by Rice University sociologist Elaine Howard Ecklund led her to conclude that (from here):

Ecklund concludes from her research that most scientists do not become irreligious as a consequence of their becoming scientists. “Rather, their reasons for unbelief mirror the circumstances in which other Americans find themselves: they were not raised in a religious home; they have had bad experiences with religion; they disapprove of God or see God as too changeable.” The disproportionately high percentage of nonbelievers among scientists (as compared to the general population) would appear to be the result of self-selection: the irreligious seem more likely to become scientists in the first place.

I’m not a sociologist, so I can’t critique Ecklund’s work. The point is that individual biases don’t necessarily average themselves out over a population of experts, so appealing to lots of authorities isn’t necessarily an improvement.

Authority as a counterexample to the accusation of ignorance

The defender of the Kalam cosmological argument for the existence of God claims that the universe has a beginning. One argument goes as follows:

Premise 1. An actual infinite cannot exist in reality.

Premise 2. An infinite temporal regress of events is an actual infinite.

Premise 3. Therefore, an infinite temporal regress of events cannot exist.

Reasoning with actual infinites requires knowledge of mathematics, specifically transfinite arithmetic. The argument’s best known defender is William Lane Craig.Craig is not a mathematician, and so one might wonder whether he is sufficiently familiar with the relevant mathematics. We all know of arguments that reveal more about the ignorance of the arguer than about the subject at hand.

There are two ways for Craig to counter the accusation that a greater knowledge of mathematics would lead one to reject Premise 1. The long way is to demonstrate his own proficiency in transfinite arithmetic. That would take a while. He discusses the topic at length in his book “The Kalam Cosmological Argument” if you want to take that route. There is a shortcut, however. Craig could provide an example of someone whose mathematical credentials are unquestioned and who affirms Premise 1. Craig can do this, and so usually does so in shortened presentations like debates. The authority is David Hilbert, who was one of the greatest mathematicians of the 20th century and who argued that “the actual infinite is nowhere to be found in reality”.

This is a valid appeal to authority, so long as we are clear on what is being claimed. Obviously, we cannot claim that Premise 1 is true because Hilbert thought so. But we can counter the accusation that anyone who believes Premise 1 is ignorant of mathematics and doesn’t understand the idea of infinity.

(Note well: I’m not defending premise 1. I’m planning a series on the cosmological arguments, so stay tuned. I’m not convinced by “Hilbert’s hotel is metaphysically absurd” style arguments. And, as Jeff Shallit has pointed out, mathematical knowledge of transfinite arithmetic is a necessary but not sufficient qualification as we are dealing with the applicability of mathematics to reality, which is physics. The accusation that “greater knowledge of physics / cosmology / relativity would lead one to reject Premise 1” can also be countered: George Ellis. The main utility of these authorities, I contend, is to take our attention away from the claimers and focus attention on the claims. We won’t get stuck in a useless debate about whether Craig really understands maths.).

Authority and hostile witnesses

A particularly useful form of appealing to authority is the use of a hostile witness. In this context, a hostile witness is one who attests to a fact in the teeth of their own biases. If someone who hates the defendant’s guts nevertheless corroborates his alibi, then this has greater weight as evidence than such corroboration from a friend of the defendant. It works in reverse as well: if the defendant’s loving wife testifies that he was out of the house between 10pm and 1am, knowing that this was the time of the murder, then this is weighty evidence. She has every reason to give him an alibi, and so the most likely reason for her statement is that it is true.

This is one of the reasons that the study of history is not paralyzed by the inevitable bias of those who write history. Is the New Testament useless as a historical record, because its writers were followers of Jesus? Not necessarily, because this bias can be used in our favour. If the gospel writers admit something about Jesus that was an embarrassment to them, then we have good reason to believe that they did not invent this story to suit their own ends.

For example, Mark 13:32 has Jesus saying “But about that day or hour no one knows, not even the angels in heaven, nor the Son, but only the Father”. If we can establish that the early Christians believed that Jesus was divine, then there is a strong bias against inventing a saying of Jesus that says that he didn’t know something. It’s not proof, but it is evidence.

As before, we must be clear about the claim that the authority is supporting. A hostile witness can be used to counter the claim that only those biased in favour of a position believe its claims. I’ve come across this in the context of the fine-tuning of the universe for intelligent life. This subject is so popular with theists that I often encounter the claim that it is the product of Christian apologetics, and believed only by “religious types”. Actually, much has been written on fine-tuning in peer-reviewed scientific journals and I can (and have) give many examples of non-theist physicists who affirm that life-permitting universes are rare in the set of possible universes.

But don’t take my word for it …

A Tale of Two Entropies

For those of us who work with degree-of-plausibility (“Bayesian”) probabilities, two situations regularly arise. The first is the need to update probabilities to take into account new information. This is usually done using Bayes’ Rule, when the information comes in the form of a proposition that is known to be true. An example of such a proposition is “The data are 3.444, 7.634, 1.227”.

More generally, information is any justified constraint on our probabilities. For example, “P(x > 3) should be 0.75” is information. If our current probability distribution $q(x)$ doesn’t satisfy the constraint, then we better change to a new distribution $p(x)$ that does. This doesn’t mean that any old $p(x)$ will do – our $q(x)$ contained hard-won information and we want to preserve that. To proceed, we choose the $p(x)$ that is as close as possible to $q(x)$, but satisfies the constraint. Various quite persuasive arguments (see here) suggest that the correct notion of closeness that we should maximise is the relative entropy:

$H(p; q) = -\int p(x) \log \frac{p(x)}{q(x)} dx$

With no constraints, the best possible $p(x)$ is equal to $q(x)$.

Another situation that arises often is the need to simplify complex problems. For example, we might have some probability distribution $q(x)$ that is non-Gaussian, but for some reason we only want to use Gaussians for the rest of the calculation, perhaps for presentation or computational reasons. Which Gaussian should we choose to become our $p(x)$? Many people recommend maximising the relative entropy for this also: in the literature, this is known as a variational approximation, variational Bayes, or the Bogoliubov approximation (there are also variations (pun not intended) on this theme).

There are known problems with this technique. For instance, as David MacKay notes, the resulting probability distribution $p(x)$ is usually narrower than the original $q(x)$. This makes sense, since the variational approximation basically amounts to pretending you have information that you don’t actually have. This issue raises the question of whether there is something better that we could do.

I suggest that the correct functional to maximise in the case of approximating one distribution by another is actually the relative entropy, but with the two distributions reversed:

$H(q; p) = -\int q(x) \log \frac{q(x)}{p(x)} dx$

Why? Well, for one, it just works better in extreme examples I’ve concocted to magnify (a la Ed Jaynes) the differences between using $H(p; q)$ and $H(q; p)$. See the figure below:

If the blue distribution represented your actual state of knowledge, but out of necessity you could only use the red or the green distribution, which would you prefer? I find it very hard to imagine an argument that would make me choose the red distribution over the green. Another argument supporting the use of this ‘reversed’ entropy is that it is equivalent to generating a large number of samples from q, and then doing a maximum likelihood fit of p to these samples. I know maximum likelihood isn’t the best, most principled thing in the world, but in the limit of a large number of samples it’s pretty hard to argue with.

A further example supporting the ‘reversed’ entropy is what happens if $q(x)$ is zero at some points. According to the regular entropy, any distribution $p(x)$ that is nonzero where $q(x)$ is zero, is infinitely bad. I don’t think that’s true, in the case of approximations – some leakage of probability to values we know are impossible is no catastrophe. This is manifestly different to the case where we have legitimate information – if $q(x)$ is zero somewhere then of course we want to have $p(x)$ zero there as well. If we’re updating probabilities, we’re trying to narrow down the possibilities, and resurrecting some is certaintly unwarranted – but the goal in doing an approximation is different.

Maximising the reversed entropy also has some pretty neat properties. If the approximating distribution is a Gaussian, then the first and second moments should be chosen to match the moments of $q(x)$. If the original distribution is over many variables, but you want to approximate it by a distribution where the variables are all independent, just take all of the marginal distributions and product them together, and there’s your optimal approximation.

If $H(p; q)$ isn’t the best thing to use for approximations, that means that something in the derivation of $H(p; q)$ applies to legitimate information but does not apply to approximations. Most of the axioms (coordinate independence, consistency for independent systems, etc) make sense, and both entropies discussed in this post satisfy those. It is only at the very end of the derivation that the reversed entropy is ruled out, and by some pretty esoteric arguments that I admit I don’t fully understand. I think the examples I’ve presented in this post are suggestive enough that there is room here for a proof that the reversed entropy $H(q; p)$ is the thing to use for approximations. This means that maximum relative entropy is a little less than universal, but that’s okay – the optimal solutions to different problems are allowed to be different!

What is Tuesday Boy telling us?

This has been killing me all week. It’s a probability problem known as the “Tuesday boy” problem. I’ll simplify the problem by reducing the possibility space.

Alice has two children. What is the probability that she has two boys given that:

a) at least one of her children is a boy?

b) at least one of her children is a boy; and at least one of her children is left handed?

c) at least one of her children is a boy, and he is left handed?

Assume that left/right handedness are equally likely.

Of Credentials and Confusion: A Fine-Tuned Critique of Hector Avalos

I’ve just about finished by series of responses to various views on the fine-tuning of the universe for intelligent life that I have encountered. Here I will respond to the work of Hector Avalos, who is professor of Religious Studies at Iowa State University. In 1998, he wrote an article for Mercury Magazine entitled “Heavenly Conflicts: the Bible and Astronomy.” While most of the article pertains to the cosmology of the Bible and it’s (shock horror) apparent contradiction with modern cosmology, he spends five paragraphs near the end discussing the anthropic principle. He writes:

Attempts to relate the Bible to astronomy are often intertwined with the search for the meaning and purpose of human life. In particular, discussions by John A. Wheeler, John Barrow and other cosmologists concerning the so-called anthropic principle – the idea that the physical constants of the universe are finely tuned for human existence – have attracted interest. The anthropic principle would assert, for example, that if the charge of the electron were other than what it is or the weights of the proton and neutron were different, then human existence would not be. But do these precise quantities necessarily indicate that human beings were part of some intelligent purpose?

The primary assumption of the anthropic principle, which is really a new version of the older “divine design” or teleological argument, seems to be that the “quantity of intelligent purpose” for an entity is directly proportional to the quantity of physico-chemical conditions necessary to create that entity. But the same line of reasoning leads to odd conclusions about many non-human entitles.

… let’s use the symbol P to designate the entire set of physico-chemical conditions necessary to produce a human being … Making a computer requires not only all the pre-existing conditions that enable humans to exist but also human beings themselves. In more symbolic terms, making a computer requires P + human beings, whereas only P is needed to make human beings. By the same logic, garbage cans and toxic pollution produced by human beings would be more purposed than human beings. So measuring the divine purpose of an entity by the number of pre-existing conditions required to make that entity is futile.

This response to the fine-tuning of the universe is confused on many levels. (more…)

Where do I stand on maximum entropy?

My title is taken from a similarly titled article by the physicist Ed Jaynes, whose work influenced me greatly. It refers to a controversial idea of epistemological probability theory: the method of maximum entropy, that was popularised and (arguably) invented by Jaynes. This principle states that, when choosing probabilities on a discrete hypothesis space, subject to constraints on the probabilities (e.g. a certain expectation value is specified), you should distribute the probability as uniformly as possible by the criterion of Shannon entropy.

It was soon realised that this is not, as Jaynes hoped, a method of assigning probabilities from scratch. With no constraints apart from normalisation, you get a uniform distribution, which is lurking in the background as a “prior” that is assumed by MaxEnt. The uniform distribution might be justified by another argument (e.g. invariance under relabelling of the hypotheses), but the point remains: Maximum Entropy updates probabilities from a previous distribution, it doesn’t generate them from scratch (I will use the term `ME’ to refer to MaxEnt applied in this updating fashion). This puts the principle on the same turf as another, much more well-accepted method for updating probabilities: Bayes’s theorem. The main difference seems to be that ME updates given constraints on the probabilities, and Bayes updates on new data.

Of course, when there are two methods that claim to do the same, or similar, things, disagreements can occur. There is a large, confusing literature on the relationship and possible conflicts between Bayes’s theorem and Maximum Entropy. I don’t recommend reading it. Actually, it gets worse: there are at least three different but vaguely related ideas that are called maximum entropy in the literature! The most common conflict is demonstrated in this short discussion by David MacKay. The problem posed is the classic one, along these lines (although MacKay presents it slightly differently): given that a biased die averaged 4.5 on a large number of tosses, assign probabilities for the next toss, x. This problem can seemingly be solved by Bayesian Inference, or by MaxEnt with a constraint on the expected value of x: E(x) =4.5. These two approaches give different answers!

Given the success of Bayes, I was confused and frustrated that nobody could clearly explain this old MaxEnt business, and whether it was still worth studying. All of this was on my mind when I attended the ISBA conference earlier this year. So, aided by free champagne, I sought out some opinions. John Skilling, an elder statesman (sorry John!) of the MaxEnt crowd, seems to have all but given up on the idea. Iain Murray, a recent PhD graduate in machine learning, dismissed MaxEnt’s claim to fundamental status, saying that it was just a curious way of deriving the exponential families. He also reminded me that Radford Neal rejects MaxEnt. These are all people whose opinions I respect highly. But in the end of this story I end up disagreeing with them.

How did this come about? At ISBA, I tracked down the only person who mentioned maximum entropy on their poster – Adom Giffin, from the USA, and had a long discussion/debate, essentially boiling down to the same issues raised by MacKay in the previous link: MaxEnt and Bayes can both be used for this problem, and are quite capable of giving different answers. I was still confused, and after returning to Sydney I thought about it some more and looked up some articles by Giffin and his colleague, Ariel Caticha. These can be found here and here. After devouring these I came to agree with their compatibilist position. The rationale given for ME is quite simple: prior information is valuable and we shouldn’t arbitrarily discard it. Suppose we start with some probability distribution q(x), and then learn that, actually, our probabilities should satisfy some constraint that q(x) doesn’t satisfy. We need to choose a new distribution p(x) that satisfies the new constraint – but we also want to keep the valuable information contained in q(x). If you seek a general method for doing this that satisfies a few obvious axioms, ME is it – you choose your p(x) such that it is as close to q(x) as possible (i.e. maximum relative entropy, minimum Kullback-Leibler distance) while satisfying the new constraint.

This seems to present a philosophical problem: where do we get constraints on probabilities from, if not from data? It is easy to imagine updating your probabilities using ME if some deity (or exam question writer) provides a command “thou shalt have an expectation value of 4.5”, but in real research problems, information never comes in this form. An experimental average is not an expectation value. I emailed Ariel Caticha with a suggested analogy for understanding this situation, which he agreed with (apparently a rare phenomenon in this field). The analogy is that, in classical mechanics, all systems can be specified by a Hamiltonian, and the equations of motion are obtained by differentiating the Hamiltonian in various ways. But hang on a second – what about a damped pendulum? What about a forced pendulum? I remember studying those in physics, and they are not Hamiltonian systems! But we understand why. Our model was designed to include only the coordinates and momenta that we are interested in – the ones about the pendulum – and not those describing the rest of the universe; these are dubbed “external to the system”, and their effects summarised by the damping coefficient or the driving force f(t). However, our use of a model of this kind does not stop us from believing that energy is actually conserved, if only our model included these extra coordinates and momenta. It also doesn’t mean we can be arbitrary in choosing the damping coefficient and the driving force f(t) – these ought to be true summaries of the relevant information about the environment.

Similarly, in inference, one should write down every possibility imaginable, and delete the ones that are inconsistent with all of our experiences (data). This would correspond to coming up with a really big model and then using Bayes’s theorem given all the data you can think of. However, this is impossible in practice, so for pragmatic reasons we summarise some of the data by a constraint on the probabilities we should use on a smaller hypothesis space, in much the same way that, in physics, we reduce the whole rest of the universe to just a single damping coefficient, or a driving force term. That is where constraints on probabilities come from – summaries of relevant data that we deem to be “external to the system” of interest. Once we have them, we need to process them to update our probabilities, and ME is the right tool for this job.

The simplest way to reduce some data to a constraint on probabilities is if the statement “I got data D” is in your hypothesis space, as it is in the normal Bayesian setup. Applying the syllogism “I got data D ==> my probabilities should satisfy P(D)=1”, then applying ME, leads directly to the conventional Bayesian result – as demonstrated by Giffin and Caticha. Thus, Bayesian Inference isn’t about accumulating data in order to overwhelm the prior information, as it is often presented. It is just the opposite – we are really trying to preserve as much prior information as possible!

This leaves one remaining loose end – that pesky biased die problem, or the analogous one discussed by MacKay. Which answer is correct? In my opinion, both are correct but deal with different prior states of knowledge (the ultimate Bayesian’s cop-out ;-)). If we actually knew that it was a repeated experiment, the Bayesian set-up of a “uniform prior over unknown probabilities”, and then conditioning on the observed mean, is correct. The hypothesis space here is the space of possible values for the 6 “true probabilities” of the die, producted with the space of possible sequences of rolls, {1,1,1,…,1}, {1,1,1,…,2}, …, {6,6,6,…,6}. Note that not all of these sequences of tosses are equally likely. If we condition on a 1 for the first toss, this raises our probability for the 2nd toss being a 1 as well. This is relevant prior information that should, and does, affect the result. This is the source of the disagreement between MaxEnt and Bayes.

If we didn’t know this whole setup about the die, merely that there were 6^N possibilities, with N large, this model would be inappropriate. We would have uniform probabilities for the sequence of tosses (this corresponds to the “poorly informed robot” from Jaynes’s book). In this scenario MaxEnt with E(x) = 4.5 completely agrees with Bayesian Inference. It is this case where MaxEnt is appropriate because we really do possess no information other than the specified average value.

This concludes my narrative of my journey from confusion to some level of understanding of this issue. At the moment, I am working on some ideas related to ME that can help clear up some difficulties in conventional Bayesian Inference. Particularly, there’s been a flare-up of controversy, basically over Lindley’s Paradox in cosmology, that I believe ME can go some way to resolving.

I’d like to leave you with a quote from neuroscientist V. S. Ramachandran, that gave me the confidence to reveal my heretical thoughts on this matter.

“I tell my students, when you go to these meetings, see what direction everyone is headed, so you can go in the opposite direction. Don’t polish the brass on the bandwagon.” – V. S. Ramachandran

I love a good paradox – especially ones I can’t see a resolution to. I’ve run into this one a few times and wanted to look into it further.

The story is as follows. A teacher is worried that her class isn’t working consistently through the term, choosing instead to “cram” on the night before an exam. So she announces to the class that there will be a quiz on the work they will cover this week. It will be a surprise quiz, sometime next week. The students, not knowing which day the quiz is on, will not have the option of staying up the night before. They must work consistently so that they are always prepared for the quiz.

One enterprising student, however, quickly realises that the test cannot be on Friday. His reasoning is sound – suppose that a student hadn’t worked consistently through the week. Suppose it is Thursday night, and the quiz hasn’t happened yet. Then the student knows that they should stay up all night and cram, which defeats the purpose of the surprise quiz. Thus, the quiz won’t happen on Friday.

So far, so good. But suppose that it is Wednesday night, and the quiz hasn’t happened yet. The quiz must be on Thursday or Friday. But we just saw that it won’t be on Friday. Thus it must be on Thursday. So the student should stay up all night cramming. Which defeats the purpose of the surprise quiz. Thus, the quiz won’t happen on Thursday.

You can see guess what happens next. On Tuesday night, the student would know that the quiz was on Wednesday. On Monday night, the student would know the quiz was on Tuesday. Thus, the student knows that the exam is on Monday, so it is obviously not a surprise quiz. The student concludes that there is no such thing as a surprise quiz.

On Tuesday, the teacher hands out the surprise quiz. The student is, frankly, surprised.

Any mathematicians out there will recognise an induction. Mathematical induction is a form of proof that works like this:

We are attempting to prove a set of statements U(n), where n = 1,2,3 … For example, U(n) could be the mathematical formula:

U(n): 1 + 2 + 3 + … + n = (n2 + n)/2

Since we have an infinite number of statements, we can’t check them one-by-one. Instead, we line them up like dominoes, and then push the first one over. More precisely, we prove the following

Step 1: If U(n) is true for any particular value of n (say, n = k), then it is true for the next value of n (i.e. n = k + 1).
Step 2: U(n) is true for n = 1 i.e. U(1) is true.

Step 2 says U(1) is true. Thus, by Step 1, U(2) must be true, because U(1) “knocks it over”. But then, by Step 1, U(3) must be true, because U(2) knocks it over. And so on.

Returning to the surprise quiz paradox, a number of the discussions of the paradox on the net (e.g. here and here ) suggest that any attempt to put the paradox in the form of an induction will fail because the term “surprise” cannot be given a precise, mathematical meaning.

However, such a formulation has been given here, in one of the comments. The problem is set out as follows:

Premise 1: On exactly one day out of the next n days, there will be a quiz.
Premise 2 (the “surprise” requirement): On the evening of day k (given that the quiz didn’t happen on days 1,2,…, k), there does not exist a proof that the quiz will be on day k+1.

The claim, then, is that these premises are inconsistent. This is shown as follows.

If the quiz was to occur on day n, then on the evening of day n – 1, there would be a simple proof that the quiz would occur on day n:
Proof:

• The quiz must occur on one of the days 1,2,3 … n (premise 1)
• The quiz did not occur on days 1,2,3 … n-1 (by assumption)
• Thus, the quiz must occur on day n.

Since premise 2 forbids such a proof, the quiz cannot occur on day n.

Now the induction shows itself. If premise 1 holds, but premise 2 requires that the quiz cannot happen on day n, then we can formulate a new premise:

Premise 1a: On exactly one day out of the next n – 1 days, there will be a quiz.

But Premise 1a, along with premise 2, can be used to formulate this premise:

Premise 1b: On exactly one day out of the next n – 2 days, there will be a quiz.

And so on. Thus, we reach the conclusion that the announcement of a surprise quiz to the class is self-defeating.

(At this point I am reminded of a passage in John Barrow’s book, “Impossibility”, where he presents a similar argument along the lines that you can only predict the future if you keep the prediction to yourself. That is a topic for another post.)

At this point, I will not attempt to resolve the paradox, because I’m not sure what the resolution is. I’m told that many philosophers have written papers on this very paradox in its various forms (other versions include a man sentenced to hang). I’ll simply leave you with a list of references to further investigation, and invite you to provide your solutions.