The lady tasting tea – how hypothesis testing was born

My younger sister said many times when she suggests something I always say no. Even when she and my cousin argued about where among two places to go for traveling, they just need the third vote from me to decide, but I also said, “I have no preference!”? That’s how they ended up arguing for one more hour, and we traveled to … our dream that weekend.
Then, I told her I do research in Statistics, and my action is inspired by a principle called hypothesis testing in the field. So, she shouldn’t expect more from me because I consider everything she asks as “null hypotheses” – the things that are born to be rejected. Then to better understand what I mean, I started to explain to her what hypothesis testing is by telling her the story of the lady tasting tea.

Once upon a time, at a tea party, a lady claimed that she could tell whether milk or tea was added first to a cup. Some people at the table perhaps doubted that. Luckily, one of the gentlemen that was sitting at the table was Fisher, a very famous statistician. So he suggested the following little experiment to check if she really has that ability or her predictions are just random guesses that fortunately are correct.

Fisher proposed giving her eight cups of tea in a random order. Four of these cups have tea added first and then milk. The remaining four cups have milk added first and then tea. Then he could determine if she was boasting or not based on the number of cups she correctly identified.

Here the null hypothesis is that the lady has no ability to distinguish whether milk was added first or tea was added first. This means she was guessing at random.

The test statistic can be the number of successes in selecting the four cups that have milk added first. Since there are eight cups, she will pick out 4 cups that she thought had milk added first. So there can be
\binom{8}{4} = \frac{8!}{4!(8-4)!}
possible combinations of the four selected cups.

Success countSelected CombinationsNumber of Combinations
0tttt1
1tttm, ttmt, tmtt, mttt16
2ttmm, tmtm, tmmt, mtmt, mmtt, mttm36
3tmmx, mtmm, mmtm, mmmt16
4mmmm1
Total70

Let t denote a cup where tea is added first and m denote a cup where milk is added first. Then the number of success identification that she could make will be between 0 and 4. We have the following analysis:

  • If the number of successes is 0, then her selected combination is “tttt” and there could be only one such combination.
  • If the number of successes is 1, then she selected only 1 cup correctly which could occur in \binom{4}{1} = 4 ways. In addition, she selected 3 incorrect cups which could occur in \binom{4}{3} = 4 ways. Therefore, a selection of one correct cup and any three incorrect cups could occur in any of 4 \times 4 = 16 ways.
  • If the number of successes is 2, then she selected 2 correct cups which could occur in \binom{4}{2} = 6 ways. In addition, she selected 2 incorrect cups which could occur in \binom{4}{2} = 6 ways. Therefore, a selection of two correct cups and any two incorrect cups could occur in any of 6 \times 6 = 36 ways.
  • If the number of successes is 3, then she selected 3 correct cups which could occur in \binom{4}{3} = 4 ways. In addition, she selected 1 incorrect cup which could occur in \binom{4}{1} = 4 ways. Therefore, a selection of 3 correct cups and 1 incorrect cup could occur in any of 4 \times 4 = 16 ways.
  • If the number of successes is 4, then she selected four correct cups which could occur in only one way “mmmm.”

A summary is given in Table 1. Note that we have 70 possible combinations.

In order to reject the null hypothesis that the lady has no ability to tell whether milk or tea was added first, I need to identify the region of outlying events termed as the “critical region,” also called the “rejection region.” But to determine that region, I need to determine how abnormal a thing is under the null hypothesis to be fallen in this critical region. Heuristically, we can consider something that happens only 5% of the time to be abnormal. Since this 5% represents how rare it is for me to believe that an event is abnormal, I’ll call it a significance level denoted as \alpha . In this case \alpha = 0.05 or 5%.

So which selected combination would be in this critical region? The ‘mmmm’ combination, which is of all correct identification, would surely fall in this region because its probability of happening is only \frac{1}{70} < 0.05 . However, any selected combination of exactly three successes will not fall in this region because the probability of getting at least three successes is \frac{16+1}{70} > 0.05 .

Thus, we can reject the null hypothesis only if the lady correctly identified all four cups that have milk added first. It was revealed that the lady was able to identify all eight cups correctly.


Discover more from Science Comics

Subscribe to get the latest posts sent to your email.

Leave a Reply

error: Content is protected !!