Application of Bayesian theorem in spam detection & medical diagnosis

Example 1: Spam Detection

Let’s say historically, 20% of emails are spam, so P(S) = 0.2

and the probability that the email is not spam is

P(NS) = 0.8.

Suppose the probability of observing the word “free” in a spam email is 0.8, P(W|S) = 0.8, and in a non-spam email is 0.05, P(W|NS) = 0.05.

Now, if we receive an email containing the word “free”, we want to know the probability that it’s spam.

Using Bayes’ Theorem:

P(S|W) = \frac{P(W|S) \times P(S)}{P(W)}

We need to calculate P(W), the overall probability of observing the word “free”:

P(W) = P(W|S) \times P(S) + P(W|NS) \times P(NS)

P(W) = (0.8 \times 0.2) + (0.05 \times 0.8) = 0.16 + 0.04 = 0.2

Now, plugging into Bayes’ Theorem:

P(S|W) = \frac{0.8 \times 0.2}{0.2} = 0.8

So, given that the email contains the word “free”, there’s an 80% chance it’s spam.

Example 2: Medical Diagnosis

Suppose the prevalence of a rare disease is 0.01, so P(D) = 0.01 and P(\neg D) = 0.99.

Let’s say the sensitivity of the test (probability of a positive result given the patient has the disease) is 0.95, P(T|D) = 0.95, and the specificity (probability of a negative result given the patient doesn’t have the disease) is 0.90, P(T|\neg D) = 0.90.

If a patient tests positive, we want to find the probability they actually have the disease.

Using Bayes’ Theorem:

P(D|T) = \frac{P(T|D) \times P(D)}{P(T)}

We need to calculate P(T), the overall probability of testing positive:

P(T) = P(T|D) \times P(D) + P(T|\neg D) \times P(\neg D)

P(T) = (0.95 \times 0.01) + (0.10 \times 0.99) = 0.0095 + 0.099 = 0.1085

Now, plugging into Bayes’ Theorem:

P(D|T) = \frac{0.95 \times 0.01}{0.1085} \approx 0.0876

So, if the patient tests positive, there’s approximately an 8.76% chance they actually have the disease.


Discover more from Science Comics

Subscribe to get the latest posts sent to your email.

Leave a Reply

error: Content is protected !!