Skip to content

Bonferroni correction – What It Is and Why It Matters

The Bonferroni correction is a statistical method used to reduce the risk of Type I errors (false positives) when you run multiple hypothesis tests. Every time you test a hypothesis, there’s a chance you’ll incorrectly reject a true null hypothesis. When you run many tests, those chances add up. The Bonferroni method reins this in by making the criteria for significance more strict.

It works by testing each hypothesis at a significance level of
\alpha_{\text{adjusted}} = \frac{\alpha}{m}
where \alpha is your desired overall error rate (often 0.05) and m is the number of tests.

🎯 Why You Need It

When you run multiple tests, the family-wise error rate (FWER)—the probability of getting at least one false positive—rises quickly. Statology emphasizes that even if each test individually has a 5% false-positive rate, running many tests inflates the chance of a false discovery. The Bonferroni correction keeps the overall error rate under control.

🧠 How It Works (Simple Example)

Suppose you want to maintain an overall \alpha = 0.05 and you’re running 10 tests.

Then each test must meet
p < 0.05 / 10 = 0.005

Only results with p < 0.005 are considered statistically significant.

📌 When to Use It

  • When you care about avoiding false positives more than missing true effects.
  • When the number of tests is moderate.
  • Common in ANOVA post-hoc comparisons, as noted by StatTrek.

🧩 Intuition

Think of it like this:
If you roll a die once, the chance of rolling a 6 is small.
If you roll it 20 times, the chance of at least one 6 is much higher.
The Bonferroni correction says: “If you’re rolling many dice, be stricter about what counts as surprising.”

See also  hypothesis testing ex: testing fat in meat part I

Examples of the Bonferroni Correction

🧪 Example 1: Simple p‑value adjustment

Suppose you run 5 independent hypothesis tests and want to keep your overall significance level at \alpha = 0.05 .

Bonferroni says:
\alpha_{\text{adjusted}} = \frac{0.05}{5} = 0.01

So each test must have p < 0.01 to be considered significant.
This matches the general rule described in Wikipedia: each hypothesis is tested at \alpha/m .

📊 Example 2: ANOVA with multiple pairwise comparisons

You run a one‑way ANOVA with 4 groups.
Number of pairwise comparisons:
m = \frac{4 \cdot 3}{2} = 6

If your overall \alpha = 0.05 , then
\alpha_{\text{adjusted}} = \frac{0.05}{6} \approx 0.0083

So each pairwise comparison must meet p < 0.0083.
This aligns with the explanation that multiple tests inflate the family‑wise error rate.

🧬 Example 3: Gene expression study (many tests)

A biologist tests 100 genes to see which ones are differentially expressed.

Desired overall error rate: \alpha = 0.05

Bonferroni threshold:
\alpha_{\text{adjusted}} = \frac{0.05}{100} = 0.0005

Only genes with p < 0.0005 are considered significant.
This illustrates how the method becomes very conservative when m is large, as noted in multiple sources.

🧪 Example 4: A/B testing with multiple variants

A product team tests 4 different button colors against a control.
That’s 4 comparisons.

If they want to keep the overall false‑positive rate at 5%:
\alpha_{\text{adjusted}} = \frac{0.05}{4} = 0.0125

So each variant must show p < 0.0125 to be considered a real improvement.
This matches the A/B testing context described by Amplitude.

🧠 Example 5: Adjusting p‑values instead of α

Instead of adjusting α, you can multiply each p‑value by the number of tests:

If you have 3 tests with p‑values:

  • 0.01
  • 0.03
  • 0.20

Bonferroni‑adjusted p‑values:

  • 0.01 \times 3 = 0.03
  • 0.03 \times 3 = 0.09
  • 0.20 \times 3 = 0.60

Only the first remains below 0.05 → only the first test is significant.
This approach is consistent with the general definition of the correction.

Leave a Reply

error: Content is protected !!