Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Welcome to Part 3 of our journey! We’ve built a solid foundation in basic probability, counting, conditional probability, and independence. Now, we introduce a central concept that bridges probability theory and data analysis: the Random Variable. Random variables allow us to quantitatively describe the outcomes of random phenomena. In this chapter, we’ll focus specifically on Discrete Random Variables, which take on a finite or countably infinite number of values. We’ll learn how to describe their behavior using Probability Mass Functions (PMFs) and Cumulative Distribution Functions (CDFs), and how to summarize them using measures like Expected Value (Mean) and Variance.

What is a Random Variable?

In many experiments, we’re not interested in the specific outcome itself, but rather in some numerical property associated with that outcome.

Definition: A Random Variable is a variable whose value is a numerical outcome of a random phenomenon. More formally, it’s a function that maps each outcome in the sample space Ω\Omega to a real number.

We typically denote random variables with uppercase letters (e.g., X,Y,ZX, Y, Z) and their specific values with lowercase letters (e.g., x,y,zx, y, z).

Types of Random Variables:

  1. Discrete Random Variable: A random variable that can only take on a finite or countably infinite number of distinct values. Often associated with counting processes.

  2. Continuous Random Variable: A random variable that can take on any value within a given range or interval. Often associated with measurement processes. (We’ll cover these in Chapter 8).

Example: Consider rolling a fair six-sided die.

Another Example: Consider flipping a coin twice.

Probability Mass Function (PMF)

For a discrete random variable, we want to know the probability associated with each possible value it can take. This is captured by the Probability Mass Function (PMF).

Definition: The Probability Mass Function (PMF) of a discrete random variable XX is a function, denoted by pX(x)p_X(x) or simply p(x)p(x), that gives the probability that XX is exactly equal to some value xx.

pX(x)=P(X=x)p_X(x) = P(X = x)

A valid PMF must satisfy two conditions:

  1. pX(x)0p_X(x) \ge 0 for all possible values xx. (Probabilities cannot be negative).

  2. xpX(x)=1\sum_{x} p_X(x) = 1, where the sum is taken over all possible values xx that XX can assume. (The total probability must be 1).

Let’s represent and visualize this PMF in Python.

Python Implementation

This code creates the PMF for a fair die by defining the possible values and their probabilities. We use a dictionary to represent the PMF for easy lookup and verify that probabilities sum to 1.

from pprint import pprint

# Define the possible outcomes (values) and their probabilities for the die roll
die_values = np.arange(1, 7) # Possible values x: 1, 2, 3, 4, 5, 6
die_probs = np.array([1/6] * 6) # P(X=x) for each value

# Create a dictionary for easier lookup
die_pmf_dict = {val: prob for val, prob in zip(die_values, die_probs)}
print("PMF Dictionary:")
pprint(die_pmf_dict)
print(f"\nSum of probabilities: {sum(die_pmf_dict.values()):.10f}")
PMF Dictionary:
{np.int64(1): np.float64(0.16666666666666666),
 np.int64(2): np.float64(0.16666666666666666),
 np.int64(3): np.float64(0.16666666666666666),
 np.int64(4): np.float64(0.16666666666666666),
 np.int64(5): np.float64(0.16666666666666666),
 np.int64(6): np.float64(0.16666666666666666)}

Sum of probabilities: 1.0000000000
PMF of a fair die roll showing uniform probability of 1/6 for each outcome.

PMF of a fair die roll showing uniform probability of 1/6 for each outcome.

Cumulative Distribution Function (CDF)

Sometimes, we are interested not just in the probability of XX being exactly a certain value, but in the probability that XX is less than or equal to a certain value. This is captured by the Cumulative Distribution Function (CDF).

Definition: The Cumulative Distribution Function (CDF) of a random variable XX (discrete or continuous), denoted by FX(x)F_X(x) or simply F(x)F(x), gives the probability that XX takes on a value less than or equal to xx.

FX(x)=P(Xx)F_X(x) = P(X \le x)

For a discrete random variable XX, the CDF is calculated by summing the PMF values for all outcomes less than or equal to xx:

FX(x)=kxpX(k)F_X(x) = \sum_{k \le x} p_X(k)

Properties of a CDF:

  1. 0FX(x)10 \le F_X(x) \le 1 for all xx.

  2. FX(x)F_X(x) is a non-decreasing function of xx: if a<ba < b, then FX(a)FX(b)F_X(a) \le F_X(b).

  3. limxFX(x)=0\lim_{x \to -\infty} F_X(x) = 0 and limx+FX(x)=1\lim_{x \to +\infty} F_X(x) = 1.

  4. For a discrete random variable, the CDF is a step function, increasing at the points where the PMF is positive.

  5. P(X>x)=1FX(x)P(X > x) = 1 - F_X(x).

  6. P(a<Xb)=FX(b)FX(a)P(a < X \le b) = F_X(b) - F_X(a) for a<ba < b.

  7. P(X=x)=FX(x)limyxFX(y)P(X=x) = F_X(x) - \lim_{y \to x^-} F_X(y) (the size of the jump at xx).

Let’s calculate and visualize the CDF.

Python Implementation

This code demonstrates how to calculate the CDF by cumulatively summing the PMF values. We then create a function die_cdf_func(x) that can evaluate the CDF at any point, handling values below 1, above 6, and in between.

from pprint import pprint

# Setup: Define die values and probabilities
die_values = np.arange(1, 7)  # Possible values: 1, 2, 3, 4, 5, 6
die_probs = np.array([1/6] * 6)  # Equal probability for fair die

# Calculate the CDF values
die_cdf_values = np.cumsum(die_probs)
print("CDF Values:")
for i, cdf_val in enumerate(die_cdf_values, start=1):
    print(f"  F({i}) = {cdf_val:.4f}")

# Create a function representation of the CDF
def die_cdf_func(x):
    if x < 1:
        return 0.0
    elif x >= 6:
        return 1.0
    else:
        # For values between die outcomes, CDF remains constant at the last "step"
        # So we find which die value we've passed and return that CDF value
        idx = np.searchsorted(die_values, x, side='right') - 1
        return die_cdf_values[idx]

# Test the function at non-integer values to see step function behavior
print("\nTesting CDF between steps:")
print(f"  F(0.5) = {die_cdf_func(0.5):.4f}  (before first outcome)")
print(f"  F(3.7) = {die_cdf_func(3.7):.4f}  (between 3 and 4, stays at F(3))")
print(f"  F(10)  = {die_cdf_func(10):.4f}  (after last outcome)")
CDF Values:
  F(1) = 0.1667
  F(2) = 0.3333
  F(3) = 0.5000
  F(4) = 0.6667
  F(5) = 0.8333
  F(6) = 1.0000

Testing CDF between steps:
  F(0.5) = 0.0000  (before first outcome)
  F(3.7) = 0.5000  (between 3 and 4, stays at F(3))
  F(10)  = 1.0000  (after last outcome)

The plot below visualizes the CDF as a step function. Notice how the function jumps at each integer value (where the die can actually land) and remains constant between integers. The filled circles show the value of the CDF at each point, while the open circles indicate the value just before the jump.

CDF of a fair die roll showing the cumulative probability as a step function.

CDF of a fair die roll showing the cumulative probability as a step function.

Expected Value (Mean)

The expected value, or mean, of a discrete random variable is a weighted average of its possible values, where the weights are the probabilities (PMF values). It represents the long-run average value we would expect if we observed the random variable many times.

Definition: The Expected Value (or Mean) of a discrete random variable XX, denoted by E[X]E[X] or μX\mu_X, is defined as:

E[X]=μX=xxpX(x)E[X] = \mu_X = \sum_{x} x \cdot p_X(x)

where the sum is over all possible values xx that XX can take.

The expected value doesn’t have to be one of the possible values of XX.

Python Implementation

This code calculates the expected value using the formula E[X]=xpX(x)E[X] = \sum x \cdot p_X(x). We use np.sum() to compute the weighted sum of values times probabilities.

# Setup: Define die values and probabilities
die_values = np.arange(1, 7)  # Possible values: 1, 2, 3, 4, 5, 6
die_probs = np.array([1/6] * 6)  # Equal probability for fair die

# Calculate the expected value
expected_value = np.sum(die_values * die_probs)
# Alternatively using dot product:
# expected_value = np.dot(die_values, die_probs)

print(f"Theoretical Expected Value E[X]: {expected_value}")
Theoretical Expected Value E[X]: 3.5
PMF with expected value marked at 3.5, the theoretical long-run average.

PMF with expected value marked at 3.5, the theoretical long-run average.

Variance and Standard Deviation

While the expected value tells us the center of the distribution, the variance and standard deviation measure the spread or dispersion of the random variable’s values around the mean.

Definition: The Variance of a random variable XX, denoted by Var(X)Var(X) or σX2\sigma_X^2, is the expected value of the squared difference between XX and its mean E[X]=μXE[X] = \mu_X.

Var(X)=σX2=E[(XμX)2]Var(X) = \sigma_X^2 = E[(X - \mu_X)^2]

For a discrete random variable, this is calculated as:

Var(X)=x(xμX)2pX(x)Var(X) = \sum_{x} (x - \mu_X)^2 \cdot p_X(x)

A computationally simpler formula for variance is often used:

Var(X)=E[X2](E[X])2Var(X) = E[X^2] - (E[X])^2

where E[X2]=xx2pX(x)E[X^2] = \sum_{x} x^2 \cdot p_X(x).

Definition: The Standard Deviation of a random variable XX, denoted by SD(X)SD(X) or σX\sigma_X, is the positive square root of the variance.

SD(X)=σX=Var(X)SD(X) = \sigma_X = \sqrt{Var(X)}

The standard deviation is often preferred because it has the same units as the random variable XX.

Python Implementation

This code calculates variance using the computational formula Var(X)=E[X2](E[X])2Var(X) = E[X^2] - (E[X])^2. We also show the alternative calculation using the definition E[(Xμ)2]E[(X - \mu)^2] to verify they give the same result.

# Setup: Define die values and probabilities
die_values = np.arange(1, 7)  # Possible values: 1, 2, 3, 4, 5, 6
die_probs = np.array([1/6] * 6)  # Equal probability for fair die
expected_value = np.sum(die_values * die_probs)  # E[X] = 3.5

# Calculate E[X^2]
e_x_squared = np.sum((die_values**2) * die_probs)
print(f"E[X^2]: {e_x_squared:.4f} (Exact: 91/6)")

# Calculate Variance using the computational formula
variance = e_x_squared - (expected_value**2)
print(f"Theoretical Variance Var(X): {variance:.4f} (Exact: 35/12)")

# Alternatively, calculate using the definition: E[(X - mu)^2]
variance_def = np.sum(((die_values - expected_value)**2) * die_probs)
print(f"Variance using definition: {variance_def:.4f}")

# Calculate Standard Deviation
std_dev = np.sqrt(variance)
print(f"Theoretical Standard Deviation SD(X): {std_dev:.4f} (Exact: sqrt(35/12))")
E[X^2]: 15.1667 (Exact: 91/6)
Theoretical Variance Var(X): 2.9167 (Exact: 35/12)
Variance using definition: 2.9167
Theoretical Standard Deviation SD(X): 1.7078 (Exact: sqrt(35/12))
PMF showing mean and standard deviation bands, illustrating the spread of the distribution.

PMF showing mean and standard deviation bands, illustrating the spread of the distribution.

Functions of a Random Variable

Often, we are interested in a quantity that is derived from a random variable. If XX is a random variable and gg is a function, then Y=g(X)Y = g(X) is also a random variable.

If XX is discrete with PMF pX(x)p_X(x), we can find the PMF of Y=g(X)Y = g(X), denoted pY(y)p_Y(y), by summing the probabilities of all xx values such that g(x)=yg(x) = y:

pY(y)=P(Y=y)=P(g(X)=y)=x:g(x)=ypX(x)\begin{align*} p_Y(y) &= P(Y=y) \\ &= P(g(X)=y) \\ &= \sum_{x: g(x)=y} p_X(x) \end{align*}

Expected Value of a Function of a Random Variable (LOTUS)

A very useful result, sometimes called the Law of the Unconscious Statistician (LOTUS), allows us to calculate the expected value of Y=g(X)Y=g(X) without explicitly finding the PMF of YY.

Definition: For a discrete random variable XX with PMF pX(x)p_X(x) and a function gg, the expected value of Y=g(X)Y = g(X) is:

E[g(X)]=xg(x)pX(x)E[g(X)] = \sum_{x} g(x) \cdot p_X(x)

Compare this to the definition of E[X]E[X]:

E[X]=xxpX(x)E[g(X)]=xg(x)pX(x)\begin{align*} E[X] &= \sum_{x} x \cdot p_X(x) \\ E[g(X)] &= \sum_{x} g(x) \cdot p_X(x) \end{align*}

Notice that we simply replace xx with g(x)g(x) in the summation. This is how we calculated E[X2]E[X^2] earlier, where g(x)=x2g(x) = x^2.

Python Implementation

This code demonstrates two methods for calculating E[Y]E[Y] where Y=X2Y = X^2: first by finding the PMF of YY and using it directly, and second by using LOTUS to calculate directly from XX without finding the PMF of YY. Both methods give the same result.

from pprint import pprint

# Setup: Define die values and probabilities
die_values = np.arange(1, 7)  # Possible values: 1, 2, 3, 4, 5, 6
die_probs = np.array([1/6] * 6)  # Equal probability for fair die

# Define the function g(x) = x^2
def g(x):
  return x**2

# Possible values for X and Y
x_values = die_values
y_values = g(x_values)

# PMF for Y (since g(x) is one-to-one for x in {1..6}, probs are the same)
y_probs = die_probs

# PMF dictionary for Y
y_pmf_dict = {val_y: prob for val_y, prob in zip(y_values, y_probs)}
print("PMF Dictionary for Y=X^2:")
pprint(y_pmf_dict)

# Calculate E[Y] using the PMF of Y
expected_value_y = np.sum(y_values * y_probs)
print(f"\nE[Y] calculated using PMF of Y: {expected_value_y:.4f} (Exact: 91/6)")

# Calculate E[Y] = E[g(X)] using LOTUS
expected_value_y_lotus = np.sum(g(x_values) * die_probs)
print(f"E[Y] calculated using LOTUS E[g(X)]: {expected_value_y_lotus:.4f} (Exact: 91/6)")
PMF Dictionary for Y=X^2:
{np.int64(1): np.float64(0.16666666666666666),
 np.int64(4): np.float64(0.16666666666666666),
 np.int64(9): np.float64(0.16666666666666666),
 np.int64(16): np.float64(0.16666666666666666),
 np.int64(25): np.float64(0.16666666666666666),
 np.int64(36): np.float64(0.16666666666666666)}

E[Y] calculated using PMF of Y: 15.1667 (Exact: 91/6)
E[Y] calculated using LOTUS E[g(X)]: 15.1667 (Exact: 91/6)
PMF of Y = X^2 showing the transformed distribution.

PMF of Y=X2Y = X^2 showing the transformed distribution.

Hands-on: Simulation and Comparison

The Law of Large Numbers (which we’ll study later) tells us that if we simulate a random variable many times, the average of the outcomes (the sample mean) should get close to the theoretical expected value E[X]E[X]. Similarly, the variance of the outcomes (the sample variance) should approach Var(X)Var(X). Let’s verify this for our die roll example.

We will:

  1. Simulate a large number of fair die rolls using numpy.random.randint.

  2. Calculate the sample mean and sample variance of the simulated outcomes.

  3. Compare these empirical results to the theoretical values (E[X]=3.5E[X]=3.5, Var(X)2.917Var(X) \approx 2.917).

  4. Visualize the distribution of the simulated outcomes (empirical PMF) and compare it to the theoretical PMF.

  5. Visualize the empirical CDF and compare it to the theoretical CDF.

# Setup: Calculate theoretical values
die_values = np.arange(1, 7)
die_probs = np.array([1/6] * 6)
expected_value = np.sum(die_values * die_probs)  # E[X] = 3.5
variance = np.sum((die_values**2) * die_probs) - expected_value**2  # Var(X) = 35/12
std_dev = np.sqrt(variance)  # SD(X)

# Number of simulations
num_simulations = 10000

# Simulate die rolls
simulated_rolls = np.random.randint(1, 7, size=num_simulations)

# Calculate empirical mean and variance
sample_mean = np.mean(simulated_rolls)
sample_variance = np.var(simulated_rolls, ddof=1)  # ddof=1 for unbiased estimator
sample_std_dev = np.std(simulated_rolls, ddof=1)

# Compare empirical vs theoretical
print(f"--- Comparison after {num_simulations} simulations ---")
print(f"Theoretical E[X]: {expected_value:.4f}")
print(f"Sample Mean:      {sample_mean:.4f}")
print(f"Difference (Mean): {abs(sample_mean - expected_value):.4f}\n")

print(f"Theoretical Var(X): {variance:.4f}")
print(f"Sample Variance:    {sample_variance:.4f}")
print(f"Difference (Var):   {abs(sample_variance - variance):.4f}\n")

print(f"Theoretical SD(X): {std_dev:.4f}")
print(f"Sample Std Dev:     {sample_std_dev:.4f}")
print(f"Difference (SD):    {abs(sample_std_dev - std_dev):.4f}")
--- Comparison after 10000 simulations ---
Theoretical E[X]: 3.5000
Sample Mean:      3.5026
Difference (Mean): 0.0026

Theoretical Var(X): 2.9167
Sample Variance:    2.9029
Difference (Var):   0.0138

Theoretical SD(X): 1.7078
Sample Std Dev:     1.7038
Difference (SD):    0.0040

Visualizing Empirical vs Theoretical Distributions

Now let’s plot the frequencies of our simulated results and compare them to the theoretical probabilities (PMF), and do the same for the cumulative distributions (CDF).

# Calculate empirical PMF and CDF
unique_outcomes, counts = np.unique(simulated_rolls, return_counts=True)
empirical_pmf = counts / num_simulations
empirical_cdf = np.cumsum(empirical_pmf)
Comparison of empirical (simulated) and theoretical distributions, demonstrating convergence.

Comparison of empirical (simulated) and theoretical distributions, demonstrating convergence.

As you can see from the simulation results and the plots, the empirical values (sample mean, sample variance, empirical PMF/CDF) obtained from a large number of simulations closely approximate the theoretical values we derived. This demonstrates the connection between probability theory and real-world observations or simulations.

Summary

In this chapter, we introduced the fundamental concept of a discrete random variable.

In the next chapter, we will explore several important families of discrete distributions that model common real-world scenarios.


Exercises

  1. Biased Coin: Consider a biased coin where the probability of getting Heads (H) is P(H)=0.7P(H) = 0.7. Let XX be the random variable representing the number of heads in a single flip (so X=1X=1 for Heads, X=0X=0 for Tails). a. What is the PMF of XX? b. What is the CDF of XX? Plot it. c. Calculate the expected value E[X]E[X]. d. Calculate the variance Var(X)Var(X) and standard deviation SD(X)SD(X).

  2. Two Dice Sum: Let XX be the random variable representing the sum of the outcomes when two fair six-sided dice are rolled. a. What are the possible values for XX? b. Determine the PMF of XX. (Hint: There are 36 equally likely outcomes for the pair of dice.) c. Calculate E[X]E[X]. Is there an easier way than using the PMF directly? (Hint: Linearity of Expectation) d. Calculate Var(X)Var(X). (Hint: Variance of sums of independent variables) e. Find P(X>7)P(X > 7). f. Find P(X is even)P(X \text{ is even}).

  3. Game Value: You pay £2 to play a game. You roll a fair six-sided die. If you roll a 6, you win £5 (getting your £2 back plus £3 profit). If you roll a 4 or 5, you win £2 (getting your £2 back). If you roll a 1, 2, or 3, you win nothing (losing your £2). Let WW be the random variable representing your net winnings (profit/loss) from playing the game once. a. What are the possible values for WW? b. Determine the PMF of WW. c. Calculate the expected net winnings E[W]E[W]. Is this a fair game? (A fair game has E[W]=0E[W]=0). d. Calculate the variance Var(W)Var(W).

  4. Simulation Comparison: Simulate rolling two fair dice and calculating their sum 10,00010,000 times. a. Calculate the sample mean and sample variance of the simulated sums. b. Compare these to the theoretical values you calculated in Exercise 2. c. Plot the empirical PMF of the simulated sums and compare it to the theoretical PMF from Exercise 2.


(Solutions/Hints Appendix)

Example Code for Exercise 1
# Exercise 1: Biased Coin
p_heads = 0.7
p_tails = 1 - p_heads

# a) PMF
x_values_coin = np.array([0, 1]) # 0 for Tails, 1 for Heads
pmf_coin = np.array([p_tails, p_heads])
pmf_coin_dict = {val: prob for val, prob in zip(x_values_coin, pmf_coin)}
print(f"Ex 1(a) - PMF: {pmf_coin_dict}")

# b) CDF
cdf_coin_values = np.cumsum(pmf_coin)
def coin_cdf_func(x):
    if x < 0: return 0.0
    if x >= 1: return 1.0
    return cdf_coin_values[0] # P(X<=0)

# c) Expected Value
ex_coin = np.sum(x_values_coin * pmf_coin)
print(f"Ex 1(c) - E[X]: {ex_coin}")

# d) Variance
ex2_coin = np.sum((x_values_coin**2) * pmf_coin)
var_coin = ex2_coin - ex_coin**2
sd_coin = np.sqrt(var_coin)
print(f"Ex 1(d) - Var(X): {var_coin:.4f}")
print(f"Ex 1(d) - SD(X): {sd_coin:.4f}")
Ex 1(a) - PMF: {np.int64(0): np.float64(0.30000000000000004), np.int64(1): np.float64(0.7)}
Ex 1(c) - E[X]: 0.7
Ex 1(d) - Var(X): 0.2100
Ex 1(d) - SD(X): 0.4583