Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

In the previous chapters, we laid the groundwork for probability, exploring sample spaces, events, and counting techniques. Now, we venture into one of the most fundamental and powerful concepts in probability theory: conditional probability.

Often, we are interested in the probability of an event occurring given that another event has already happened. Our knowledge or assumptions about one event can change our assessment of the probability of another. This is the essence of conditional probability. It allows us to update our beliefs in the face of new information.

1. Definition and Intuition

Conditional Probability measures the probability of an event AA occurring given that another event BB has already occurred (or is known to have occurred). We denote this as P(AB)P(A|B), read as “the probability of A given B”.

Intuition: Imagine the entire sample space SS. When we know that event BB has occurred, our focus effectively narrows down from the entire sample space SS to just the outcomes within BB. We are now interested in the probability that AA occurs within this new, reduced sample space BB. The outcomes favourable to “A given B” are those that belong to both AA and BB, i.e., ABA \cap B.

Formal Definition: For any two events AA and BB from a sample space SS, where P(B)>0P(B) > 0, the conditional probability of AA given BB is defined as:

P(AB)=P(AB)P(B)P(A|B) = \frac{P(A \cap B)}{P(B)}

where:

1.1. Visual representation

The Venn diagram below shows the general structure of conditional probability. When we condition on event BB having occurred, we restrict our attention to the circle BB. Within that circle, P(AB)P(A|B) represents the proportion of BB that overlaps with AA.

Generic Venn diagram: P(A|B) is the overlap relative to B.

Generic Venn diagram: P(AB)P(A|B) is the overlap relative to BB.

2. The Multiplication Rule for Conditional Probability

Rearranging the definition of conditional probability gives us the General Multiplication Rule, which is useful for calculating the probability of the intersection of two events:

P(AB)=P(AB)P(B)P(A \cap B) = P(A|B) P(B)

Similarly, if P(A)>0P(A) > 0, we can write:

P(AB)=P(BA)P(A)P(A \cap B) = P(B|A) P(A)

This rule is particularly helpful when dealing with sequential events, where the outcome of the first event affects the probability of the second.

The multiplication rule can be extended to more than two events. For three events A,B,CA, B, C:

P(ABC)=P(CAB)P(BA)P(A)P(A \cap B \cap C) = P(C | A \cap B) P(B | A) P(A)

3. The Law of Total Probability

Sometimes, calculating the probability of an event AA directly is difficult. However, we might know the conditional probabilities of AA occurring under various mutually exclusive and exhaustive scenarios. The Law of Total Probability lets us combine those scenario-based probabilities into one overall probability.

3.1 Definition

Let B1,B2,,BnB_1, B_2, \ldots, B_n be a partition of the sample space SS. This means:

  1. BiBj=B_i \cap B_j = \emptyset for all iji \neq j (the events are mutually exclusive),

  2. B1B2Bn=SB_1 \cup B_2 \cup \cdots \cup B_n = S (they cover the whole sample space),

  3. P(Bi)>0P(B_i) > 0 for all ii (so the conditional probabilities are well-defined).

Then, for any event AA in SS, the Law of Total Probability states:

P(A)=i=1nP(ABi)P(Bi).P(A) = \sum_{i=1}^{n} P(A \mid B_i)\,P(B_i).

Equivalently, written as an expanded sum:

P(A)=P(AB1)P(B1)+P(AB2)P(B2)++P(ABn)P(Bn).\begin{align*} P(A) ={}& P(A\mid B_1)P(B_1) \\ & + P(A\mid B_2)P(B_2) \\ & + \ldots \\ & + P(A\mid B_n)P(B_n). \end{align*}

3.2 Why it works

The key idea is that the partition breaks AA into disjoint pieces:

A=(AB1)  (AB2)    (ABn),A = (A\cap B_1)\ \cup\ (A\cap B_2)\ \cup\ \cdots\ \cup\ (A\cap B_n),

and these pieces do not overlap because the BiB_i do not overlap.

So we can add their probabilities:

P(A)=i=1nP(ABi).P(A) = \sum_{i=1}^n P(A\cap B_i).

Finally, apply the multiplication rule P(ABi)=P(ABi)P(Bi)P(A\cap B_i)=P(A\mid B_i)P(B_i) to each term:

P(A)=i=1nP(ABi)P(Bi).P(A) = \sum_{i=1}^n P(A\mid B_i)P(B_i).

3.3 Intuition

Think of the BiB_i as “which scenario we are in.” First, one scenario BiB_i happens (with probability P(Bi)P(B_i)). Then, within that scenario, AA happens with probability P(ABi)P(A\mid B_i). The overall probability P(A)P(A) is a weighted average of the conditional probabilities P(ABi)P(A\mid B_i), weighted by how likely each scenario is.

3.4 Visual intuition: area model

How to read the diagram

Area model: P(A) is the sum of the disjoint pieces A\cap B_i.

Area model: P(A)P(A) is the sum of the disjoint pieces ABiA\cap B_i.

3.5 Visual intuition: probability tree (same idea, different view)

A tree diagram shows the same logic: first choose which scenario BiB_i occurs, then (within that scenario) whether AA occurs. (See section 4 for more detail on tree diagrams.)

On the branch SBiAS \to B_i \to A, the probability is the product P(Bi),P(ABi)P(B_i),P(A\mid B_i). Summing those “AA” leaves over all scenarios gives P(A)P(A).

4. Tree Diagrams

Tree diagrams are a useful visualization tool for problems involving sequences of events, especially when conditional probabilities are involved.

4.1. Generic tree structure

Before looking at a specific example, let’s see the general pattern. Suppose we have a partition B1,B2,,BnB_1, B_2, \ldots, B_n of the sample space, and we’re interested in whether event AA occurs. The tree diagram below shows how we first “choose” which scenario BiB_i happens, then (within that scenario) whether AA occurs or not.

Reading the tree:

P(A)=P(AB1)P(B1)+P(AB2)P(B2)++P(ABn)P(Bn)P(A) = P(A|B_1)P(B_1) + P(A|B_2)P(B_2) + \cdots + P(A|B_n)P(B_n)

This is exactly the Law of Total Probability from section 3, just visualized as a tree instead of an area model.

5. Tips for differentiating between P(AB)P(A \cap B) and P(AB)P(A | B)

It can be challenging to differentiate between P(AB)P(A \cap B) and P(AB)P(A | B) in probability problems.

P(AB)P(A \cap B) represents the probability that both event A AND event B occur. Look for keywords like “and,” “both,” or phrases indicating a direct overlap between two characteristics. For example, “the probability that a student is an engineering major and is female.”

P(AB)P(A | B) signifies the probability of event A occurring GIVEN that event B has already occurred. This is a conditional probability, focusing on a subset of the population. Phrases such as “given that,” “of those who,” or “if a [characteristic B] is selected” are strong indicators. For instance, “Of the students who study engineering, 20% are female” is an example of P(FemaleEngineering)P(\text{Female} | \text{Engineering}).

The key distinction lies in whether the problem describes the likelihood of two events happening simultaneously (intersection) or the likelihood of one event happening under the condition that another event has already happened (conditional).

Chapter Summary

Key Takeaways

The core insight: Conditional probability P(AB)P(A|B) represents our updated belief about event AA given that we know event BB has occurred. It restricts the sample space to only outcomes where BB is true.

The fundamental concepts:

  1. Conditional Probability P(AB)=P(AB)P(B)P(A|B) = \frac{P(A \cap B)}{P(B)}, provided P(B)>0P(B) > 0

    • Represents probability of AA given BB has occurred

    • Restricts sample space from SS to just the outcomes in BB

    • Key distinction: P(AB)P(AB)P(A|B) \neq P(A \cap B) — conditioning vs. intersection!

  2. Multiplication Rule: P(AB)=P(AB)P(B)=P(BA)P(A)P(A \cap B) = P(A|B) \cdot P(B) = P(B|A) \cdot P(A)

    • Fundamental for computing joint probabilities

    • Can chain for multiple events: P(ABC)=P(A)P(BA)P(CAB)P(A \cap B \cap C) = P(A) \cdot P(B|A) \cdot P(C|A \cap B)

  3. Law of Total Probability: If B1,B2,,BnB_1, B_2, \ldots, B_n partition the sample space, then:

    P(A)=i=1nP(ABi)P(Bi)P(A) = \sum_{i=1}^n P(A|B_i) \cdot P(B_i)
    • Breaks complex probability into simpler conditional pieces

    • Essential for scenarios with multiple pathways or stages

    • Foundation for Bayes’ Theorem (next chapter)

  4. Tree Diagrams: Visual tool for organizing sequential or staged probabilities

    • Branches represent conditional probabilities

    • Path probabilities multiply along branches

    • Final outcome probabilities sum across relevant paths

Why This Matters

Conditional probability is fundamental to:

Common Pitfalls to Avoid

  1. Confusing P(AB)P(A|B) with P(AB)P(A \cap B):

    • P(AB)P(A|B) is a proportion: “out of times B occurs, how often does A also occur?”

    • P(AB)P(A \cap B) is absolute: “how often do both A and B occur?”

    • Visual check: P(AB)P(A|B) uses BB as the “whole”, P(AB)P(A \cap B) uses full sample space SS

  2. Reversing the conditioning (the prosecutor’s fallacy):

    • P(AB)P(BA)P(A|B) \neq P(B|A) in general!

    • Example: P(positive testdisease)P(diseasepositive test)P(\text{positive test}|\text{disease}) \neq P(\text{disease}|\text{positive test})

    • Need Bayes’ Theorem to flip conditioning (Chapter 5)

  3. Forgetting to partition completely:

    • For Law of Total Probability, events BiB_i must be mutually exclusive and exhaustive

    • Missing a partition element leads to incorrect totals

  4. Misreading tree diagrams:

    • Branches show conditional probabilities, not joint probabilities

    • Multiply along paths, sum across paths

Visual Mnemonics

Conditional probability: Zoom into region BB, see what fraction is also in AA

Multiplication Rule: Path probability = product of conditional steps along the path

Law of Total Probability: Weighted average over all possible “routes” to AA

Next Steps

In Chapter 5, we’ll build on conditional probability to explore:

Exercises

  1. Two Dice: If you roll two fair six-sided dice, what is the conditional probability that the sum is 8, given that the first die shows a 3? What is the conditional probability that the first die shows a 3, given that the sum is 8?

  2. Medical Test: A disease affects 1 in 1000 people. A test for the disease is 99% accurate (i.e., P(Positive | Disease) = 0.99) and has a 2% false positive rate (i.e., P(Positive | No Disease) = 0.02). Use the Law of Total Probability to calculate the overall probability that a randomly selected person tests positive. (We will revisit this in the Bayes’ Theorem chapter).

  3. Two Cards — Same Rank: Draw two cards from a standard 52-card deck without replacement. What is the probability that the two cards have the same rank (e.g., two 7s, two Kings)?

  4. Choosing a Coin — Total Probability: A bag contains two fair coins and one biased coin.

    • If a coin is fair, (P(H)=0.5).

    • If a coin is biased, (P(H)=0.8). You randomly pick one coin from the bag and flip it twice. What is the probability of getting exactly one Head?