Chapter 5: Bayes’ Theorem and Independence#

In the previous chapter, we explored conditional probability – how the probability of an event changes given that another event has occurred. Now, we’ll delve into one of the most powerful and widely applicable results stemming from conditional probability: Bayes’ Theorem. This theorem provides a formal way to update our beliefs (probabilities) in light of new evidence. We will also formally define and explore the concept of independence between events, a crucial idea for simplifying probability calculations.

Learning Objectives:#

  • Understand the derivation and interpretation of Bayes’ Theorem.

  • Distinguish between prior and posterior probabilities.

  • Apply Bayes’ Theorem to solve problems, particularly diagnostic testing scenarios.

  • Define and test for the independence of events.

  • Understand the concept of conditional independence.

  • Implement Bayesian updates and independence checks using Python simulations.

1. Bayes’ Theorem: Derivation and Interpretation#

Bayes’ Theorem provides a way to “reverse” conditional probabilities. If we know \(P(B|A)\), Bayes’ Theorem helps us find \(P(A|B)\). It’s named after Reverend Thomas Bayes (1701-1761), who first provided an equation that allows new evidence to update beliefs.

Derivation:

Recall the definition of conditional probability:

  1. \(P(A|B) = \frac{P(A \cap B)}{P(B)}\), provided \(P(B) > 0\).

  2. \(P(B|A) = \frac{P(B \cap A)}{P(A)}\), provided \(P(A) > 0\).

Since \(P(A \cap B) = P(B \cap A)\), we can rearrange these equations:

  1. \(P(A \cap B) = P(A|B) P(B)\)

  2. \(P(B \cap A) = P(B|A) P(A)\)

Setting them equal gives:

\(P(A|B) P(B) = P(B|A) P(A)\)

Dividing by \(P(B)\) (assuming \(P(B) > 0\)), we get Bayes’ Theorem:

\[P(A|B) = \frac{P(B|A) P(A)}{P(B)}\]

Interpretation:

Let’s think of A as an event or hypothesis we are interested in (e.g., “a patient has a specific disease,” “a coin is biased”) and B as new evidence or data observed (e.g., “the patient tested positive,” “we observed 8 heads in 10 flips”).

  • \(P(A)\): Prior Probability. Our initial belief about the probability of A before seeing the evidence B.

  • \(P(B|A)\): Likelihood. The probability of observing the evidence B given that our hypothesis A is true.

  • \(P(B)\): Probability of Evidence. The overall probability of observing the evidence B, regardless of whether A is true or not. This often requires using the Law of Total Probability (from Chapter 4): \(P(B) = P(B|A)P(A) + P(B|A^c)P(A^c)\).

  • \(P(A|B)\): Posterior Probability. Our updated belief about the probability of A after observing the evidence B.

Bayes’ Theorem tells us how to update our prior belief \(P(A)\) to a posterior belief \(P(A|B)\) based on the likelihood of the evidence \(P(B|A)\) and the overall probability of the evidence \(P(B)\).

2. Updating Beliefs: Prior and Posterior Probabilities#

The core idea of Bayesian thinking is updating beliefs. We start with a prior belief, gather data (evidence), and update our belief to a posterior. This posterior can then become the prior for the next piece of evidence.

Example: Imagine you have a website and you’re testing a new ad banner.

  • Hypothesis (A): The new ad banner is effective (e.g., has a click-through rate > 5%).

  • Prior (\(P(A)\)): Based on previous ad campaigns, you might initially believe there’s a 30% chance the new ad is effective. So, \(P(A) = 0.30\).

  • Evidence (B): You observe a visitor’s Browse history (e.g., they previously visited related product pages).

  • Likelihood (\(P(B|A)\)): The probability that a visitor has this Browse history given the ad is effective. Perhaps effective ads are better targeted, so this might be high, say \(P(B|A) = 0.70\).

  • Likelihood (\(P(B|A^c)\)): The probability that a visitor has this Browse history given the ad is not effective. This might be lower, say \(P(B|A^c) = 0.20\).

  • Probability of Evidence (\(P(B)\)): Using the Law of Total Probability: \(P(B) = P(B|A)P(A) + P(B|A^c)P(A^c)\) \(P(B) = (0.70)(0.30) + (0.20)(1 - 0.30)\) \(P(B) = 0.21 + (0.20)(0.70) = 0.21 + 0.14 = 0.35\)

  • Posterior (\(P(A|B)\)): Now apply Bayes’ Theorem: \(P(A|B) = \frac{P(B|A) P(A)}{P(B)} = \frac{(0.70)(0.30)}{0.35} = \frac{0.21}{0.35} = 0.60\)

After observing the visitor’s Browse history, your belief that the ad is effective increased from 30% (prior) to 60% (posterior).

3. Applications: The Diagnostic Test Example#

One of the most classic and intuitive applications of Bayes’ Theorem is in interpreting the results of medical diagnostic tests.

Scenario:

  • A certain disease affects 1% of the population. (Prevalence)

  • A test for the disease has 95% accuracy:

    • If a person has the disease, the test correctly identifies it 95% of the time. (Sensitivity)

    • If a person does not have the disease, the test correctly identifies it 95% of the time. (Specificity)

Question: If a randomly selected person tests positive, what is the probability they actually have the disease?

Let’s define the events:

  • \(D\): The person has the disease.

  • \(D^c\): The person does not have the disease.

  • \(Pos\): The person tests positive.

  • \(Neg\): The person tests negative.

What we know:

  • \(P(D) = 0.01\) (Prior probability of having the disease - Prevalence)

  • \(P(D^c) = 1 - P(D) = 0.99\)

  • \(P(Pos|D) = 0.95\) (Probability of testing positive given you have the disease - Sensitivity)

  • \(P(Neg|D) = 1 - P(Pos|D) = 0.05\) (False Negative Rate)

  • \(P(Neg|D^c) = 0.95\) (Probability of testing negative given you don’t have the disease - Specificity)

  • \(P(Pos|D^c) = 1 - P(Neg|D^c) = 0.05\) (False Positive Rate)

What we want to find: \(P(D|Pos)\) (The probability of having the disease given a positive test result).

Apply Bayes’ Theorem:

\(P(D|Pos) = \frac{P(Pos|D) P(D)}{P(Pos)}\)

We need to find \(P(Pos)\). Use the Law of Total Probability:

\[\begin{split} \begin{align*} P(\text{Pos}) &= P(\text{Pos}|D)P(D) + P(\text{Pos}|D^c)P(D^c) \\ &= (0.95)(0.01) + (0.05)(0.99) \\ &= 0.0095 + 0.0495 \\ &= 0.0590 \end{align*} \end{split}\]

Now substitute into Bayes’ Theorem: \(P(D|Pos) = \frac{(0.95)(0.01)}{0.0590} = \frac{0.0095}{0.0590} \approx 0.161\)

Interpretation: Even with a positive test result from a 95% accurate test, the probability of actually having the disease is only about 16.1%! This seems counter-intuitive but highlights the strong influence of the low prior probability (prevalence) of the disease. Most positive tests come from the large group of healthy people who receive a false positive, rather than the small group of sick people who receive a true positive.

4. Independence of Events#

Two events A and B are said to be independent if the occurrence (or non-occurrence) of one event does not affect the probability of the other event occurring.

I.e. Two events A and B are said to be independent if knowing whether one event happened tells you nothing about whether the other event will happen. Their probabilities are not linked.

4.1. Formal Definition#

The formal mathematical definition of independence between two eventsis that Events A and B are independent if and only if: \(P(A \cap B) = P(A) P(B)\)

4.2. Alternative Definition (using conditional probability)#

If \(P(B) > 0\), A and B are independent if and only if:

\(P(A|B) = P(A)\)

Similarly, if \(P(A) > 0\), independence means:

\(P(B|A) = P(B)\)

This definition aligns with the intuition: knowing B occurred doesn’t change the probability of A.

Important Note: Do not confuse independence with mutual exclusivity.

  • Mutually exclusive events cannot happen together (\(A \cap B = \emptyset\), so \(P(A \cap B) = 0\)).

  • Independent events can happen together, but one doesn’t affect the other’s probability. If two events A and B have non-zero probabilities, they cannot be both mutually exclusive and independent. If they were mutually exclusive, \(P(A \cap B) = 0\). If they were independent, \(P(A \cap B) = P(A)P(B) > 0\). This is a contradiction.

5. Conditional Independence#

Sometimes, two events A and B might not be independent overall, but they become independent given some other event C. This is called conditional independence.

Formal Definition:

Events A and B are conditionally independent given event C (where \(P(C) > 0\)) if:

\(P(A \cap B | C) = P(A|C) P(B|C)\)

Alternative Definition: If \(P(B|C) > 0\), conditional independence means:

\(P(A | B \cap C) = P(A|C)\)

Knowing B occurred provides no additional information about A if we already know C occurred.

Example: Consider two different coins, one fair (Coin F) and one biased to land heads 75% of the time (Coin B).

  • Let \(H_1\) be the event “the first flip is Heads”.

  • Let \(H_2\) be the event “the second flip is Heads”.

Are \(H_1\) and \(H_2\) independent? It depends on whether we know which coin we are flipping!

Intuition: Fuel efficiency might depend on tire pressure and engine size. These two factors might seem correlated overall (cars with bigger engines might tend to have specific tire pressure recommendations). However, given a specific car model, the effect of tire pressure on fuel efficiency might be independent of the effect of engine size (assuming the model already fixes the engine size).

6. Hands-on Exercises#

Exercise 5.1: Implementing Bayes’ Theorem for Disease Test#

Let’s verify the disease test calculation using Python. Define variables for the prior probability, sensitivity, and specificity, then implement the calculation for \(P(D|Pos)\).

# Parameters
p_disease = 0.01        # P(D) - Prior probability (prevalence)
p_pos_given_disease = 0.95 # P(Pos|D) - Sensitivity
p_neg_given_no_disease = 0.95 # P(Neg|D^c) - Specificity

# Derived probabilities
p_no_disease = 1 - p_disease                 # P(D^c)
p_pos_given_no_disease = 1 - p_neg_given_no_disease # P(Pos|D^c) - False Positive Rate

# Calculate P(Pos) using Law of Total Probability
p_pos = (p_pos_given_disease * p_disease) + (p_pos_given_no_disease * p_no_disease)

# Calculate P(D|Pos) using Bayes' Theorem
p_disease_given_pos = (p_pos_given_disease * p_disease) / p_pos

print(f"Prior P(Disease): {p_disease:.4f}")
print(f"Sensitivity P(Pos|Disease): {p_pos_given_disease:.4f}")
print(f"Specificity P(Neg|No Disease): {p_neg_given_no_disease:.4f}")
print(f"False Positive Rate P(Pos|No Disease): {p_pos_given_no_disease:.4f}")
print("-" * 30)
print(f"Overall P(Pos): {p_pos:.4f}")
print(f"Posterior P(Disease|Pos): {p_disease_given_pos:.4f}")

# What if the test is *negative*? Calculate P(Disease | Neg)
# P(Neg) = P(Neg|D)P(D) + P(Neg|D^c)P(D^c)
p_neg_given_disease = 1 - p_pos_given_disease # P(Neg|D) - False Negative Rate
p_neg = (p_neg_given_disease * p_disease) + (p_neg_given_no_disease * p_no_disease)

# P(D|Neg) = P(Neg|D)P(D) / P(Neg)
p_disease_given_neg = (p_neg_given_disease * p_disease) / p_neg

print("-" * 30)
print(f"Overall P(Neg): {p_neg:.4f}")
print(f"Posterior P(Disease|Neg): {p_disease_given_neg:.4f}")
print(f"Posterior P(No Disease|Neg) = {1 - p_disease_given_neg:.4f}")
Prior P(Disease): 0.0100
Sensitivity P(Pos|Disease): 0.9500
Specificity P(Neg|No Disease): 0.9500
False Positive Rate P(Pos|No Disease): 0.0500
------------------------------
Overall P(Pos): 0.0590
Posterior P(Disease|Pos): 0.1610
------------------------------
Overall P(Neg): 0.9410
Posterior P(Disease|Neg): 0.0005
Posterior P(No Disease|Neg) = 0.9995

Exercise 5.2: Simulating Bayesian Updates#

Let’s simulate the disease test scenario to build intuition. We’ll create a population reflecting the disease prevalence, simulate their test results based on sensitivity/specificity, and then calculate the conditional probability directly from the simulated data.

import numpy as np
import pandas as pd

# Parameters
population_size = 1000000
p_disease = 0.01
p_pos_given_disease = 0.95
p_pos_given_no_disease = 0.05 # 1 - specificity

# Create population
# Assign actual disease status
has_disease = np.random.rand(population_size) < p_disease
num_diseased = np.sum(has_disease)
num_healthy = population_size - num_diseased

# Simulate test results
# Initialize test results array
tests_positive = np.zeros(population_size, dtype=bool)

# For those WITH the disease
tests_positive[has_disease] = np.random.rand(num_diseased) < p_pos_given_disease

# For those WITHOUT the disease
tests_positive[~has_disease] = np.random.rand(num_healthy) < p_pos_given_no_disease

# Create a DataFrame for easier analysis
data = pd.DataFrame({'Has_Disease': has_disease, 'Tests_Positive': tests_positive})
print(data.head())

# Calculate counts from the simulation
true_positives = np.sum(data['Has_Disease'] & data['Tests_Positive'])
false_positives = np.sum(~data['Has_Disease'] & data['Tests_Positive'])
total_positives = np.sum(data['Tests_Positive'])

# Calculate P(Disease | Positive) from simulation data
simulated_p_disease_given_pos = true_positives / total_positives

# Compare with theoretical calculation
print("\n--- Simulation Results ---")
print(f"Population Size: {population_size}")
print(f"Number Actually Diseased: {num_diseased}")
print(f"Number Actually Healthy: {num_healthy}")
print(f"Number Testing Positive: {total_positives}")
print(f"  - True Positives: {true_positives}")
print(f"  - False Positives: {false_positives}")
print("-" * 30)
print(f"Simulated P(Disease | Positive): {simulated_p_disease_given_pos:.4f}")
print(f"Theoretical P(Disease | Positive): {p_disease_given_pos:.4f}")
   Has_Disease  Tests_Positive
0        False           False
1        False            True
2        False           False
3        False           False
4        False           False

--- Simulation Results ---
Population Size: 1000000
Number Actually Diseased: 9844
Number Actually Healthy: 990156
Number Testing Positive: 59098
  - True Positives: 9340
  - False Positives: 49758
------------------------------
Simulated P(Disease | Positive): 0.1580
Theoretical P(Disease | Positive): 0.1610

As the population_size increases, the simulated probability should converge to the theoretical probability calculated using Bayes’ Theorem. This demonstrates how the theorem accurately reflects the underlying frequencies in a large population.

Exercise 5.3: Testing Independence from Data#

Let’s simulate rolling two fair dice and check if the events “first die is even” and “sum is 7” are independent.

  • Event A: First die is even. \(P(A) = 1/2\).

  • Event B: Sum is 7. The pairs are (1,6), (2,5), (3,4), (4,3), (5,2), (6,1). \(P(B) = 6/36 = 1/6\).

  • Event \(A \cap B\): First die is even AND sum is 7. The pairs are (2,5), (4,3), (6,1). \(P(A \cap B) = 3/36 = 1/12\).

Theoretical Check: Is \(P(A \cap B) = P(A)P(B)\)? \(1/12 \stackrel{?}{=} (1/2) \times (1/6)\) \(1/12 = 1/12\). Yes, they are theoretically independent.

Now, let’s check using simulation.

import numpy as np
import pandas as pd

num_rolls = 100000

# Simulate rolls
die1 = np.random.randint(1, 7, size=num_rolls)
die2 = np.random.randint(1, 7, size=num_rolls)
sums = die1 + die2

# Define events
event_A = (die1 % 2 == 0)  # First die is even
event_B = (sums == 7)     # Sum is 7

# Create DataFrame
rolls_data = pd.DataFrame({'Die1': die1, 'Die2': die2, 'Sum': sums, 'A': event_A, 'B': event_B})
print(rolls_data.head())

# Calculate probabilities from simulation
p_A_sim = np.mean(event_A)
p_B_sim = np.mean(event_B)
p_A_intersect_B_sim = np.mean(event_A & event_B)

# Check independence condition
print("\n--- Independence Check from Simulation ---")
print(f"Simulated P(A): {p_A_sim:.4f} (Theoretical: 0.5000)")
print(f"Simulated P(B): {p_B_sim:.4f} (Theoretical: {1/6:.4f})")
print(f"Simulated P(A intersect B): {p_A_intersect_B_sim:.4f} (Theoretical: {1/12:.4f})")
print("-" * 30)
print(f"P(A) * P(B) = {p_A_sim * p_B_sim:.4f}")
print(f"Is P(A intersect B) approx equal to P(A) * P(B)? {'Yes' if np.isclose(p_A_intersect_B_sim, p_A_sim * p_B_sim, atol=0.005) else 'No'}") # Use np.isclose for floating point comparison

# Alternative check: Is P(A|B) approx equal to P(A)?
# P(A|B) = P(A intersect B) / P(B)
if p_B_sim > 0:
    p_A_given_B_sim = p_A_intersect_B_sim / p_B_sim
    print(f"\nSimulated P(A|B): {p_A_given_B_sim:.4f}")
    print(f"Is P(A|B) approx equal to P(A)? {'Yes' if np.isclose(p_A_given_B_sim, p_A_sim, atol=0.01) else 'No'}")
else:
    print("\nCannot calculate P(A|B) as P(B) is zero in simulation.")
   Die1  Die2  Sum      A      B
0     6     2    8   True  False
1     5     3    8  False  False
2     3     4    7  False   True
3     5     6   11  False  False
4     6     5   11   True  False

--- Independence Check from Simulation ---
Simulated P(A): 0.5001 (Theoretical: 0.5000)
Simulated P(B): 0.1671 (Theoretical: 0.1667)
Simulated P(A intersect B): 0.0827 (Theoretical: 0.0833)
------------------------------
P(A) * P(B) = 0.0836
Is P(A intersect B) approx equal to P(A) * P(B)? Yes

Simulated P(A|B): 0.4947
Is P(A|B) approx equal to P(A)? Yes

The simulation results should be close to the theoretical values, confirming the independence of these events. Small discrepancies are expected due to random sampling variation.

Chapter Summary#

  • Bayes’ Theorem \(P(A|B) = \frac{P(B|A) P(A)}{P(B)}\) provides a fundamental rule for updating probabilities (beliefs) based on new evidence.

  • It relates the posterior probability \(P(A|B)\) to the prior probability \(P(A)\) and the likelihood \(P(B|A)\).

  • The term \(P(B)\) acts as a normalizing constant and can often be calculated using the Law of Total Probability.

  • Bayes’ Theorem is crucial in fields like medical diagnosis, machine learning (spam filtering, classification), and scientific reasoning.

  • Two events A and B are independent if \(P(A \cap B) = P(A)P(B)\), or equivalently, \(P(A|B) = P(A)\) (assuming \(P(B)>0\)). The occurrence of one does not change the probability of the other.

  • Events A and B are conditionally independent given C if \(P(A \cap B | C) = P(A|C)P(B|C)\). They become independent once the outcome of C is known.

  • Simulation is a valuable tool for building intuition about Bayes’ Theorem and independence by observing frequencies in generated data.

In the next part of the book, we will shift our focus from events to Random Variables – numerical outcomes of random phenomena – and explore their distributions. This will allow us to model and analyze probabilistic situations in a more structured way.