This has been killing me all week. It’s a probability problem known as the “Tuesday boy” problem. I’ll simplify the problem by reducing the possibility space.
Alice has two children. What is the probability that she has two boys given that:
a) at least one of her children is a boy?
b) at least one of her children is a boy; and at least one of her children is left handed?
c) at least one of her children is a boy, and he is left handed?
Assume that left/right handedness are equally likely.
Let’s look at a). A naive first answer might reason as follows: we know that one child is a boy. The probability of the other child being a boy is a half. Thus the probability of two boys is a half. And this is wrong, because it doesn’t count the number of possibilities correctly. There are not two relevant possibilities: boy and girl. There are, in fact, three when you take the order of the children into account: girl-boy, boy-girl, boy-boy. So the answer to a) is one third.
Our first reaction to b) might be that, as we are being asked about gender, the left-handedness of one of the children is irrelevant. And, in this case, we are correct. To see why, let’s write out the possibilities: let B/G = boy/girl, L/R = left/right handed. There are 9 ways to have one boy, and one left-hander:
(BL)(BL) ; (BL)(BR) ; (BR)(BL) ; (GL)(BL) ; (GL)(BR) ; (GR)(BL) ; (BL)(GL) ; (BL)(GR) ; (BR)(GL).
Of these, the first three have two boys, giving a probability of a third, as before. This is a general feature of probabilities: adding useless information doesn’t change your probabilities.
It seems that the exact same reasoning would apply to c). Knowing that the boy is left-handed shouldn’t change anything. But let’s have a look at those possibilities …
(BL)(BL) ; (BL)(BR) ; (BR)(BL) ; (GL)(BL) ; (GL)(BR) ; (GR)(BL) ; (BL)(GL) ; (BL)(GR) ; (BR)(GL). (1)
Thus the correct answer to c) is 3/7.
Huh?! We can reverse our previously given axiom: if the probabilities have changed, then we must have been given useful information. But how is the fact that the boy is left-handed relevant to the probability that the other child is a boy? In fact, the more information given about the one child that is a boy, the more likely it is that the other child is a boy. For example, if we’re told that “one child is a boy, born on Tuesday”, then the probability of two boys is 13/27. If we’re told he was born on the 1st of October, then it’s 729/1459 = 0.4996. The probability approaches a half.
This is bonkers, isn’t it? Suppose Alice is making you guess the gender of her other child. A bully, unbeknownst to Alice, is making you guess “boy”. You have a 1/3 chance of being correct. You have a stroke of genius. “Is your boy left-handed?”, you ask. It doesn’t matter what the answer is – once you know, your chances of being correct rise to 3/7. The more questions you ask, the better your chances. But how? How is this information relevant?
The resolution of this paradox comes by looking more closely at the possibilities left behind in (1) above. Let’s get together a huge group of two-child families. Send “girl-girl” (GG) families home – one quarter depart. The BB families make up a third of the remainder. Now, send home any family that doesn’t have a left-hander. Once again, one quarter leave. BB still makes up a third. Now send home any family that doesn’t have a left-handed boy. All the BB families are safe. But the some of the GB and BG families must leave – those where the boy is right handed.
The lesson is as follows: the information that the boy is left-handed is useful information because it removes the scenarios where Alice has one right-handed boy and one left-handed girl, but doesn’t remove the possibility that Alice has one right handed boy and one left-handed boy. The more restrictions are placed on the boy, the less likely that a family with only one boy will be able to fulfill those restrictions.
The more pressing issue is that our intuitions are often mistaken when it comes to probability.
So, why can’t you argue as follows:?
If, once you know that the boy is left handed, the probability goes to 3/7. Conversely, if you know that he is right handed, the probability goes to 3/7. You know that he is either left handed or right handed (assuming that nobody is ambidextrous – a follow-on assumption from the even distribution of left/right handedness), so therefore the probability of the second child being a boy will be 3/7 regardless. If you then substitute handedness with birthdate combined with hair colour (or whatever), you can get the number arbitrarily close to 1/2.
This is similar to (some might even say a rewording of) your guessing game scenario, when you say “It does’t matter what the answer is – once you know, your chances of being correct rise”.
In order to resolve the paradox, I claim that it’s not enough to simply present a different way of looking at the problem, you have to show why the original way was wrong. I can certainly believe that my above argument might be flawed, but I can’t for the life of me see it.
Incidentally, I was at a trivia night a couple of years ago, and we won by a single point. One question that we got right that nobody else did was (equivalent to) your question a) above. It was glorious! If I remember correctly, however, the wording of the question was such that the probability-guesser could actually see the child in question. They could therefore see that he has (say) brown hair, blue eyes, freckles and a gammy leg. This information would have driven the probabillity as close to 1/2 as makes no difference, so I guess we were wrong and everyone else was right!
This has been bugging me all week too since I read about it in New Scientist. I think the paragraph in your posting ..
“You have a stroke of genius. “Is your boy left-handed?”, you ask. It doesn’t matter what the answer is …”
.. does not help to clarify the issue because it is a slightly different problem. I think this is easier to see if you consider the question “Was your boy born on 29th Feb?” instead.
The original problem is then
P(BB | 1B born on 13th Feb),
whereas the new question has two possible outcomes:
i. P(BB | 1B and born on 13th Feb);
or
ii. P(BB | 1B and NOT born on 13th Feb).
These are obviously different things in this case. I think you may have missed this point because your question has two possible answers with equal probability.
Useful to see your comments on this question though because I am starting to get a feel for the resolution to this apparent paradox – I think it comes down to considering what is the probability that someone would ask you the question in the first place. You are more likely to be asked the question “What is the probability that Alice has two boys given that at least one of her two children is a boy, and he was born on 29th Feb” if Alice has two boys, thus doubling the chances that at least one of her children has the rather rare characteristic “boy born on 29th Feb” in the first place.
-Pierre
I’ve become more dissatisfied with my response since writing the post… back to the drawing board, perhaps …
I haven’t worked through this in detail, but I worry about the wording of proposition (c). I think this may be the source of the trouble.
“at least one of her children is a boy, and he is left handed”
Who is “he”, if N=2?
I take the following to be an equivalent wording of the problem:
Alice has two children. One of them is a left handed boy. What is the probability that the other child is also a boy?
This whole ‘Tuesday Boy’ problem arises from muddling the difference between combinations (without regard to order/sequence) and permutations (where a different order defines a different state and probability). The possible combinations of 2 children are: 2 boys, 2 girls, or one of each. If there is at least one boy the combinations are 2 boys or one of each and the probability of 2 boys is 1/2. If the analysis expresses boy+girl and girl+boy as different states (which they are only when specified as permutations i.e with respect to order) then the boy+boy states must also be expressed as 2 different states, which they too are with respect to order, i.e. boy(a)+boy(b) and boy(b)+boy(a). This is the subtlety which is widely overlooked and leads to erroneous probabilities of 1/3, 13/27 etc. When these permutations are included as they must be, the probabilites become 1/2, 14/28 etc. Probability theory requires the comparison of homogeneous functions and cannot be applied to a mixed set of combinations and permutations.
I’m surprised that Brendon did not comment that essentially the same problem was discussed at very great length at SSSF some time ago. Probably it was either before his time, or he has blocked out a painful memory.
There were six very long threads in all, but all but the last of them have gone to SSSF Heaven (or more likely SSSF Hell). Here’s the last one:
http://www2b.abc.net.au/science/k2/stn/archives/archive32/newposts/145/topic145735.shtm
The original question there was:
If a woman has exactly two children:
If the elder one is a boy, what is the probability that both are boys?
If at least one is a boy, what is the probability that both are boys?
I am convinced that the question is ambiguous; most people disagree.
(BTW, why “Tuesday Boy”?)
I think that Luke was actually there, but then wavered and became less certain.
Contrary to the many views I took when struggling to understand the problem, it is not paradoxical, nor does it violate the laws of mathematics.
Furthermore, it does not involve a conflation of permutations and combinations.
Debate seems to be caused by two things:
Firstly, people construe the problem variously. Since different readers adopt different frames, the result they argue for differs.
Secondly, 13 / 27 *is* an answer to *one* framing of the problem, but it is not intuitive. This causes debate.
Let me begin by framing the problem twice, each time in terms of a procedure used to solve it.
First frame: Imagine that I am specifically looking for a parent who has two children, one of whom is a boy born on a Tuesday. I begin with a roomful of parents. I ask the following questions.
(Q1) “Stand up if you have two children”
Say that 196 people stand up, and that each of them has a unique pair of children, C1 and C2 drawn from {B,G} x {M,Y,W,T,F,S,S}. (7×2)^2 = 196
(Q2) Then I say “Remain standing if and only if one of them is a boy”
All parents of two girls sit down. This is exactly 1/4 of the total, so 49 sit down and 147 remain standing.
(Q3) Then I say “Remain standing if and only if you have a boy who was born on a Tuesday”
Now 7/49 parents with a C1 boy and 7/49 parents with a C2 boy remain standing. As do 13/49 parents with C1 and C2 as boys. So 27 remain standing.
(Q4) Last question: “How many of you have two boys?”
Expected answer: 13 / 27
The step in this process that jars with our expectations is (Q3). Surely, we think, there should be 7/49 parents with two boys remaining? This would imply a probability of two boys as 1/3, the same as it was before we introduced the ‘Tuesday’ criteria.
But 13/27 is correct. Consider each question as reducing the possible sets of events (hence starting with 196 people). After (Q2) we have:
C1 C2 No.People
B B 49
B G 49
G B 49
So, at this stage, the probability of two boys is 49 / 147, or 1/3.
But Condition (Q3) then removes *more* possibilities from GB and BG than it does from BB. To see this, consider the fact that anyone with two boys has
almost twice the chance of having ‘a boy born on a Tuesday’ than does a person with only one boy. Put this way, it seems trivially obvious!
Second frame: Imagine that I am walking the street and I stop a random person.
(Q1) “How many children do you have?”
(A1) “Two”.
(Q2) “Picture one of your children. Is it a boy?”
(A2) “Yes it is.”
(Q3) “Excellent. And which day of the week was he born on?”
(A3) “A Tuesday”
(Q4) “And your other child, is it a boy also?”
(A4) “Why yes it is.”
(Q5) “Aha! The probability of you answering ‘yes’ to that last question was 1/3”
(A5) “Fascinating.”
Why is the answer different? Because ‘Tuesday’ was not a criteria used to slim down a pool of possible situations. If (A1) or (A2) had been ‘No’ or the questionner would have moved on, but (Q3) was an incidental question whose answer did not determine how the questionner would proceed.
What does this tell us about probability?
That the probabilities depend on the process used to frame the question. We see this all the time in science. Somebody collects data, spots a pattern,
invents a hypothesis and submits the pattern as statistical evidence justifying it. But the process is back-to-front, and the statistics unsound.
My struggle with this problem was that the mathematics gives me answer (1) but intuitively I felt that it is actually question (2) which was being asked.
The answer depends on the frame used by the answerer. Much of the debate of this problem seems to stem from the fact that there is some ambiguity in most presentations of the problem.
Will: that was a very good analysis, and almost entirely correct. Unfortunately, the answer to your second framing is 1/2, not 1/3.
Since you asked (Q2) “Picture one of your children. Is it a boy?”, some of the fathers of a boy and a girl could be picturing their daughter. In fact, we have to assume half of them do. So, while 3/4 of fathers *could* answer (A2) “Yes it is,” only 1/2 of them actually will. With the 1/4 families that have two boys always answering “yes,” that means the probability is (1/4)/(1/2)=1/2, not (1/4)/(3/4)=1/3.
To get 1/3, you have to replace (Q2) “Picture one of your children. Is it a boy?” with (Q2′) “Picture both of your children. Is it true that at least one is a boy?” That way, all of the 1-boy families will answer “yes.”
The difference in these formulations is whether you assume the specific information “at least one of her children is a boy” or “at least one of her children is a boy born on Tuesday” was *required* as part of the selection process and could be true of either child, or if non-specific information in the form “at least one of her children is a {boy,girl}” or “at least one of her children is a {boy,girl} born on {Mo,Tu,We,Th,Fr,Sa,Su}” was observed of a specific child *after* the selection was accomplished. These correspond to your two framings.
When it is required, the answers are 1/3 and 13/27, respectively. It changes because once you have parents of at least one boy still standing, they are in the proportion 2:1 for 1-boy and 2-boy families, respectively. But when you ask about an additional piece of information that applies to a boy with probability P, the 1-boy families will qualify with a probability of P, while the 2-boy ones qualify with a probability of 2P-P^2. So a greater fraction of those with 2 boys remain standing.
When is it observed of one child, there is always a chance the information given in this puzzle is true of the “other” child (notice that there is no “other” child in the first framing), but you don’t learn it. The answer is always 1/2, regardless of any additional information.
The “Tuesday” answer given by lukebarnes seems unintuitive because the “required” scenario is an unintuitive interpretation of being told “one was born on a Tuesday.” There is no reason to suspect that his Alice was selected because she had a child born on Tuesday. But that means it is also improper to assume she was selected because she had a boy.
Will Hardman,
I agree with JeffJo that the answer to your second framing is 1/2 not 1/3.
But go back to your first framing. Change Question 3 to:-
“Remain standing if and only if that boy was born on a Tuesday.”
You would then have those with a C1 boy (7) and those with a C2 boy (7) standing, along with the one set of parents with two boys born on Tuesday. A total of 15 parents standing.
12 sets of parents look confused, however. They say “We can’t be sure that the boy you refer to was born on a Tuesday. We all have two boys, only one of which was born on a Tuesday. So what do we do – stand or sit?”
What do you answer?
Thanks to both of you for your responses. I agree completely that the real answer to my second formulation of the problem was 1/2 and not 1/3. It’s a testiment to the insideous nature of the question that I could spend so long structuring a solution and *still* get it wrong. Either that or I’m a lot simpler than I like to I think I am…
In my opinion JeffJo’s analysis hits to the heart of this problem. I actually wrote several paragraphs along the same line but cut them to stop my post becoming a thesis.
There’s something really key about the notion of ‘required’ and ‘incidental’ criteria used to slim down the probability space.
That was at the heart of the two formulations: in the first, ‘Tuesday’ was a required criteria of all parents who remained standing, in the second it was an incidental aspect of somebody’s reply.
To address Ron Osmand’s comment: I think that your question nicely highlights the fatal sting of ambiguity in discussing probability. What the questionner asked was not clear, and so (s)he could tell the confused parents either to stand or to sit, depending upon what (s)he had in mind.
Of all mathematical puzzles I’ve come across recently this is by far my favourite. I think that there’s a really deep lesson about probability in here and I’m now very interested to see where else I can find analogues of the situations.
Regards,
Will
@Ron: the fact that the people look confused shows that you are making an inappropriate interpretation. But the answer to your question is that the choice between “remain standing if Fact F is true of either one child” or “remain standing if Fact F is true of the child you pictured” has to be the same for both facts, since both are phrased the same way at the top of the blog. There can be no justificaiton for treating them differently, even if you like the results better that way.
@Will: A similar, real-life application is the Principle of Restricted Choice in Bridge. If you don’t know the location of two equivalent cards, when an opponent plays one of them, it reduces the chance that opponent has the other.
JeffJo,
You may have a point, but I don’t understand your argument yet.
You say the choice “has to be the same for both facts”
Which two facts are these?
And what does “either one child” mean?
All you need to do to undertsand it, is to try. There are two facts presented about one child: it’s a boy, and it was born on Tuesday. Both are presented the same way.
In the simple problem, if “boy” is required for the family to be in the sample space (i.e., if only fathers of at least one boy were allowed to speak on the problem at G4G, and he had to make the example about the boy), the probabilty is 1/3. If it is observed (i.e., if any father was allowed to speak, and it was possible to say “One is a girl” if he had a boy and a girl, but this one just happened to say “boy”), then the probability is 1/2.
In the “Born On Day” variation, if the second fact (in the example, “born on Tuesday”) is treated the same way as the first fact (“boy”), the answers are 13/27 and 1/2, respectively. If you insist it can’t change, your interpretation has to be that the facts (BOTH of them) were observed, not required. Because there is no justificaiton FROM THE STATEMENTS ALONE for treating the two facts differently.
It is only if you choose one interpretation for the first fact (“boy,” required) and a different one for the second (“”Born on Tuesday,” observed) that you can get “1/3” for both. But no logical person would do that, as the two facts are presented the same way.
JeffJo,
Thank you for the expanded explanation. I now understand the point you are making.
Having understood it, I disagree with it.
I can see why you and others feel that the answer to the “simple question” is ½ and that the answer to the “Tuesday Boy” question is also ½.
My issue with the Foshee answers of 1/3 and 13/27 respectively is founded on the fact that they imply that the knowledge of a boy’s day of birth affects the probability of his belonging to a two child family. This is not just counter-intuitive, but borders on astrology.
Your set of answers (both 1/2) is non-astrological in this sense, and therefore probably has a head start on the Foshee set of answers.
Having said that, let us look at the Foshee answers, and accept for the moment that 1/3 is the correct answer to the simple question. In English, this says that if a parent has a two child family and at least one of those is a boy, then the probability of having two boys is 1/3.
If we now add the fact that one of the boys was born on a Tuesday, then the probability of the parent having a two boy family rises to 13/27. The first thing to say is that the P of a two boy family is also 13/27 if the boy in question is born on any other day of the week, so WHICHEVER day of the week he is born on the P of a two boy family is still 13/27.
And yet we know that the P for a two boy family if the day of birth is unspecified (as per the simple question) is 1/3. Reason to doubt the method of calculation, one might conclude?
There are other ways of dividing the time of the boy’s birth. You could divide it in two, by saying that he was either born between midnight and noon, or he was born between noon and midnight, and then the overall P of a two boy family would be 3/7.
Divide his possible time of birth by 3, 4, 5, and 6 and you will get four other different answers to the P of a two boy family. But you will have used the same method of calculation for each one, and this is the same method that Foshee used for the “simple question”.
So – one method of calculation and many different answers to the same basic question.
Now let me try to explain the correct method of calculation.
When you state that “at least one of the children is a boy”, you are really saying that ONE of the two birth events that led to the formation of the two child family resulted in a male birth. The result of the other birth event is undeclared.
Now take the position after Will Hardman’s Question 2 above. You have the three groups of 49 families BB BG and GB all standing up.
You then say “Select the birth event that resulted in that boy being born” The BG group choose (as they must) the first event and the GB group choose (as they must) the second event (the first and second events not necessarily signifying chronological order).
But the BB group then must CHOOSE between the first and second events, on the grounds that the single boy to whom they refer CANNOT occupy BOTH birth events.
Then you say “Remain standing if the boy was born on Tuesday”. 7 parents will remain standing from each group, giving a denominator of 21. Of these 7 are in the BB grouping. (The other seven boys born on a Tuesday being in the birth event NOT chosen by the parents)
This gives a fraction of 7/21 or 1/3.
Using this method will always give an answer of 1/3 for any sub-division of the boy’s birth time, and is therefore more robust than the Foshee calculation method, which, as I demonstrated, is flawed because it gives different answers to the same question.
I understand your argument, JeffJo, but in the end it is, just like mine, only in English, and not a mathematical proof.
I prefer my argument, even though you are going to tell me it is wrong.
Ron, you keep saying ” the knowledge of a boy’s day of birth affects the probability of his belonging to a two child family,” and that it incorrect. It is REQUIRING a family to have a boy born on a date that affects the probability YOU WILL FIND A FAMILY THAT MEETS THE REQUIREMENT. A two-boy family is about twice as likely to meet it as a one-boy family, so the probabilities change.
Also, I refuse to “accept for the moment that 1/3 is the correct answer to the simple question,” because that is just as much “astrology” as the 13/27 answer. You refuse to recognize that, so you keep trying to make one kind of “astrology” agree with another. And until you recognize that fact, the is no point in discussing it with you.
The “correct method of calculation” is to define a random process, not a set of information. The question “what is the probability Gary Foshee has two boys” is vaccuous. He either does, or he doesn’t; there can be no probability associated with a single outcome. Probability is a property of an event in a random process, and Gary Foshee’s family is not an event. Go look it up, if you need to.
To get that random process, you have to define how you got the informaiton, and it has to be the same for both facts.
JeffJo,
Well – thanks very much for that accurate, informative, well-mannered and open-minded contribution.
The answer is 1/2. It’s 1/2 if the boy is born on a Tuesday, if he’s left handed, if he has a rare genetic disorder only present in 0.0000001% of the population… it’s always 1/2.
Thinking otherwise by improperly constructing and manipulating the sample spaces is nothing more than a variant of committing the gambler’s fallacy of thinking independent events influence each other’s odds of occurance.
Since the entire thing rests on using the exact same approach that gives us an answer of 1/3 instead of 1/2 in the simplest version of the puzzle (i have two children, one is a boy…) we can demonstrate the problem there.
Everyone can construct the initial search space of possible familes for which the statement “I have two children” with no further information provided is true:
G(1)G(2)=25%
B(1)G(2)=25%
G(1)B(2)=25%
B(1)B(2)=25%
…and nobody argues over it (yes there’s a reason I numbered them. No the numbers don’t have any real world meaning… they do not refer to age or anything else. they just make it easier to keep track of things) . But people tend not to think it through fully when they then assign all of them the value of 25%. They get stuck in a mindset that those are the probabilities of each because, well, each of them is one out of 4 possible outcomes.
WRONG.
Those are the probabilities of each because we have assumed that the independent odds of occurance of any individual child being either a B or a G are 50%. So G(1)G(2) = 0.5 x 0.5 = 25%, etc…
That the odds of any random child in the population being either a boy or a girl are 50/50 is the basis of constructing the initial probabilities. That value is a given, you don’t get to futz with it without justification as you then begin to introduce and account for new information.
Now, when we are told that at least one of the children is a boy that is the same as saying that at least one B probability in any given two child sample is now at 100%, not 50%.
That means the corresponding G probability for that child goes to 0% (which is how, mathematically, the GG option in the search space gets crossed out… multiplication by 0)
Now, we don’t know *which* child that is. People who say the answer is 1/3 say that’s why… they’re wrong. Irrelevent. Since we don’t know which it is it is possible it is either one… and we are faced with two possible scenarios:
1. The child that is definitely a boy is the “first” (“first” meaning anything you want… youngest, on the left, shorter, don’t care) child… in which case:
B(1) = 100%
G(1) = 0%
B(2) = 50%
G(2) = 50%
G(1)G(2) = 0 x 0.5 = 0%
B(1)G(2) = 1 x 0.5 = 50%
G(1)B(2) = 0 x 0.5 = 0%
B(1)B(2) = 1 x 0.5 = 50%
OR
2. The child that is definitely a boy is the “second” one:
B(1) = 50%
G(1) = 50%
B(2) = 100%
G(2) = 0%
G(1)G(2) = 0.5 x 0 = 0%
B(1)G(2) = 0.5 x 0 = 0%
G(1)B(2) = 0.5 x 1 = 50%
B(1)B(2) = 0.5 x 1 = 50%
There are two equally possible outcomes in scenario 1… and two in scenario 2… with both scenarios also being equally possible relative to each other. Neither scenario contains a GG option so GG is now eliminated as a possible outcome. Each of the BG and GB outcomes are represented in one possible scenario but NOT the other, and BB is a possibility in either scenario. Making the final search space:
G(1)G(2)=0%
B(1)G(2)=25%
G(1)B(2)=25%
B(1)B(2)=50%
And the odds of a BB combo 1/2.
If you want to argue that it should be:
G(1)G(2)=0%
B(1)G(2)=33%
G(1)B(2)=33%
B(1)B(2)=33%
Then you need to provide values for the probability of individual children being either a boy or a girl that multiply out to those answers for the 2 child combinations. If you do that you will find that you must declare that once you have declared that one child is definitely a boy you are then claiming that any given random child in the remaining sample space must be TWICE AS LIKELY to be a girl as a boy, which is unjustifiable. it is no different in any way than saying “If I flip one heads the other coin is more likely to be a tails”. No. Gambler’s Fallacy. We have been given no information that altered birth rates, and saying the odds of individual children in the population being girls INCREASED as a result of information that only eliminated girls from that population is absurd.
The answer is 1/2 and will always be 1/2 no matter what information you tell us about the child that is a boy… unless that information DIRECTLY bears on the probability of his having a sibling of a specific gender. Like “one child is a boy whose sibling plays with barbies”… then you could argue the odds of the sibling being female jumped up. Otherwise, no. The odds of the other sibling being a boy or girl is 50/50 because the odds of ANY random child in the population being a boy or a girl is 50/50.
Hi Grant,
Here’s a matlab code that creates 100,000 two child families, counts the number with two boys and then divides it by the number with at least one boy:
n = 100000;
gg = 0; gb = 0; bg = 0; bb = 0;
for i=1:n
x = rand(2,1)<0.5;
if (x(1)&x(2))
gg = gg+1;
elseif (x(1)&~x(2))
gb = gb+1;
elseif (~x(1)&x(2))
bg = bg+1;
else
bb = bb+1;
end
end
n1b = gb+bg+bb;
fbb = bb/n1b;
printf('total families: '); n
printf('number with at least one boy: '); n1b
printf('fraction of those with at least one boy who have two boys: '); fbb
I get the output:
total families: n = 100000
number with at least one boy: n1b = 75207
fraction of those with at least one boy who have two boys: fbb = 0.332
That’s a lot closer to a third than a half …
Yes luke, I know that if you create 1000 two child families there will be twice as many mixed gender pairs as 2 boy pairs. Not the point.
That search space is constructed using the information “I have two children” and with the baseline assumption that any given child is equally likely to be male or female. The point is that that ceases to be an appropriate search space when you are given the statement “I have two children, one of them is a boy” instead of simply the statement “I have two children”.
To get these values:
GG = 0%
GB=33%
BG=33%
BB=33%
…requires us to change our baseline assumption about the frequency of males and females in the population when we have been given no information that justifies taking such action. The proper search space generated from the statement “I have two children, one of them is a boy” is the one I described in my last post with the set of two possible scenarios.
In order to have an answer of GG=0% *one* of the Gs MUST be set to 0% for all possible outcomes, it is the only way to get that answer. That is accomplished by the statement that one child is certainly (100%) a boy. Having done that however, the only way to arrive at 33% values for the other three combinations is to skew the male/female birthrates. Since the statement “one is a boy” does not alter the biology of the human race to double the frequency of female births the answer you arrive at by using the search space manipulation you employ in that matlab program is not appropriate. It is intuitively correct, but is proven wrong when you dig into what that answer requires.
Grant C,
I think your calculations are correct, but that they are correct ONLY for a situation where you REVEAL the outcome of one of the birth events.
If I told you that my neighbour had two children, at least one of which was a boy (I obviously know the sex of both children, but you don’t) then, from your point of view, the equally possible outcomes are:-
B1G2
B1B2
G1B2
At this stage both G1 and G2 are still possible, so neither can be classed as a 0% probability.
The point is that you don’t KNOW whether the boy referred to IS “first or second” until I tell you.
When I do the reveal has taken place and the possible outcomes are then:-
B1G2 or B1B2 (Boy “first” and G1 possibility killed off)
OR
G1B2 or B1B2 (Boy “second” and G2 possibility killed off)
This (AFTER the reveal) is where the 50% probability of BB becomes true, and not before.
I find the “reveal” concept slightly baffling myself, and I accept that I could easily be wrong, but these are my current thoughts.
@Grant:
You are mistaking “prior” probability with “posterior” probability. That is, predicting how likely something is before it happens, with predicting how likely it was after it happened based on incomplete knowledge of what already occurred.
For example, if I roll a red die and a black die, the prior probability that the red die will land on 6 is 1/6. But if I tell you that I already rolled them, and that the sum of the two is 11, the posterior probability that the red die already landed on 6 is 1/2. A condition that is placed jointly on the pair limits the properties the individuals can have.
Nobody is committing the “gambler’s fallacy” here. Nobody is necessarily thinking of a single child, and claiming the properties of that single child influence the gender of the other. If you treat “One of them is a boy born on Tuesday” as a property of the pair, just like “sum is 11” is a property of the pair of dice, then that fact limits what properties the two individuals can have.
The question is, tho, if the original statement is meant to be a property of the pair, or of an individual. Either interpretation is possible, so you cannot say that the probability is “always 1/2.” Those who answer “1/3” are ignoring the difference, but so are you. You need to discuss why one interpretation should be preferred over the other.
@Luke:
But it is just as wrong to merely count the ways a result could have happened in a vacuum. You have to create a sample space for the ORIGINAL process, and divide it into two sets of events. By ORIGINAL, I mean the selection of a family with no conditions applied at all, and then allow for every possiblility. Even those you know didn’t happen, or that may be based on a hidden random factor.
To see why, look at another famous paradox: the Game Show Problem. You are offered the choice of three doors. A new car is behind one, but goats are behind the other two. You choose Door #1, but before opening that door the host opens another (which he always does) to show a goat (which he knew was there). Say it is Door #3. He offers to let you switch to the remaining door. Should you?
If you count cases just like you did in your program, you will need three totals: C1, C2, and C3 for the three possible places the car could be. Each will be about the same number. You will throw out C3 at the end, just like you threw out GG, and say the probabilities for the two remaining places are C1/(C1+C2) and C2/(C1+C2), respectively. Both will be about 1/2, and there is no benefit to switching.
And that’s wrong. Even though the host didn’t open Door #2, your simulation needs to keep track of the cases where he could have. So you need six variables, not three: C1H2, C1H3, C2H2, C2H3, C3H2, and C3H3. If you do not take the host’s motives into account, each of these will again be about the same number, and you get the same results: the probabilities (knowing the host choose #3 and it didn’t have the car) are C1H3/(C1H3+C2H3) and C2H3/(C1H3+C2H3), which are still 1/2.
But we must assume that the host won’t open the door with the car. So while C1H2+C1H3, C2H2+C2H3, and C3H2+C3H3 are all the same number, C2H2=C3H3=0, so C1H2=C1H3=C2H3/2=C3H2/2. Now when you calculate C1H3/(C1H3+C2H3) and C2H3/(C1H3+C2H3), you get 1/3 and 2/3, respectively, and you should switch. The important thing to note here is that to get the right answer, you have to consider everything that could have happened. Including those that don’t agree with the condition in the problem statement that the host opened Door #3, and that he revealed a goat.
The question is, how do you do the same in the Two Child Problem? To decide that, you need to ask yourself two questions that concern the things you know didn’t happen: What would a father of GG say, and could a father of BG or GB say the same thing? If you answer the second question “yes,” you have to keep track of six totals, not four: BB_BOY, BG_BOY, BG_OTHER, GB_BOY, GB_OTHER, and GG_OTHER. The answer is then BB_BOY/(BB_BOY+BG_BOY+GB_BOY), which will be 1/2.
@Ron
“Reveal” is not the critical fact. It is whether the father is talking about a property of specific child, or a property of the pair. Revealing the child is one way to ascertain that it is a specific child, but not the only way.
And our previous disagreements are because you think, in “one is a boy born on Tuesday,” that “boy” is a property of the pair while “born on Tuesday” is a property of a specific child. Since both facts come from the same statement, they must be referring to a child or the pair the same way. I agree that it should be about a specific child, but if you allow one to be about the pair, both are.
Wow – this thread is *still* going on!
Grant C:
In your most recent post you say that Luke Barnes’ computational proof is incorrect because the baseline assumption ‘ceases to be an appropriate search space when you are given the statement “I have two children, one of them is a boy” instead of simply the statement “I have two children”’
The statement of the problem that you address in both of your posts (which were, by the way, very clearly written) is somewhat terse and ambiguous. The possible solutions depend upon the process by which the sample space is constructed, and the successive stipulations are applied. In particular, whether the ‘Tuesday’ criteria is ‘required’ or ‘incidental’ matters a great deal. (JeffJo summarises this quite succinctly on his July 16th post)
JeffJo,
You say ““Reveal” is not the critical fact. It is whether the father is talking about a property of specific child, or a property of the pair.”
In my scenario, the father doesn’t say a word. But let’s assume that I (his neighbour) know that the boy has a specific property and I tell my friend of it.
Could you explain why that could be critical in my scenario?
As I have said before, I’m really not confident about the “reveal” concept, and I am interested in how knowing about a “specific property” alters things.
Ron, “reveal” is not a critical fact IN AND OF ITSELF. Or “saying” anything, for that matter. All that matters is if the information is about a specific child, and revealing that child is one way to verify that. Just not the only way. It is determined by whatever agent passed the information along, which is why I keep trying to go back to the same problem statement where a father “tells” it.
Anyway, in probability, you have to be equally concerned with what could have happen, but didn’t; as with what did happen. The reason is that you have to make all of the possibilities add up to 100%. You need to create the full sample space, which includes families of two girls here.
In your scenario, the question is ambiguous and cannot be answered unless your neighbor knows why you choose to tell him “one is a boy” or “one is a boy born on Tuesday” about my family. Because he needs that information to know what could have happened, but didn’t.
If you picked one child, by any method, and filled in the blanks appropriately in “one is a ” or “one is a born on ,” then the probabilites are both 1/2. The point here is that “could have but didn’t” includes picking a child (boy or girl) born on Friday who has a brother born on Tuesday. We have to allow for that possibility, that the family has a matching boy but you didn’t mention him, even if we know it didn’t happen.
If the information “boy” or “boy born on a Tuesday” was picked first, and picked a family to tell your neighbor about because that family has a matching child, then the “things that didn’t happen” can’t include a family with a matching boy. Then, the probabilities are 1/3 and 13/27. It changes from 1/3 to 13/27 because a two-boy family is about twice as likely to have Tuesday Boy as a one-boy family. This isn’t “astrology,” it is a side effect of looking for a family with a Tuesday Boy. And it is the same side effect that makes the answer 1/3 instead of 1/2 when you only mention that he is a boy.
JeffJo,
That’s all very interesting, but you are answering a different question.
I was hoping that you could actually answer my question and explain why knowing a property of the boy in question can impact probability.
And you ignore how my answer addresses your question. There is no boy whose probability is affected. It is the pair.
To state it another way: if Fact F is about a specific child, no information about him affects his sibling. The chance the sibling is a boy is 1/2.
If the problem statement is about a pair and Fact F includes “boy,” the probability that the pair includes two boys is (1-p)/(2-p), where p is the probability that a random child satisfies Fact F. Thus, when F=”boy,” p=1/2, and the probability is 1/3. When F=”boy born on Tuesday,” P=1/14, and the probability is 13/27.
Your problem is that you keep mixing the applicibility of the fact between “specific child” and “pair.”
Beam me up Scotty.