In previous chapters, we explored single random variables and then pairs or groups of random variables (joint distributions, covariance, correlation). Now, we take the next step: what happens when we combine multiple random variables using mathematical functions?
For example, if represents the revenue from product A and represents the revenue from product B, we might be interested in the distribution of the total revenue . Or, if and are coordinates, we might want to know the distribution of the distance from the origin, .
This chapter explores methods for finding the distributions of such combined variables, focusing on sums, differences, products, ratios, general transformations, and order statistics. We’ll see how theoretical results can be derived and how simulation can provide empirical insights, especially when analytical solutions are complex.
Distributions of Sums, Differences, Products, and Ratios¶
One of the most common operations is finding the distribution of the sum of two or more random variables.
Sums of Independent Random Variables¶
Let and be two independent random variables, and let . Finding the distribution of involves a technique called convolution.
Discrete Case: If and are discrete with PMFs and , the PMF of is given by the convolution formula:
Since and are independent, . Therefore:
The sum is over all possible values for .
Example: If and are independent, then .
Continuous Case: If and are continuous with PDFs and , the PDF of is given by the convolution integral:
Alternatively, you can swap the roles of X and Y: .
Example: If and are independent, then . Example: If and are independent, then has a triangular distribution on . We will simulate this later.
Differences, Products, and Ratios¶
Finding the distributions for differences (), products (), or ratios () can also be done using transformations or convolution-like methods, but the formulas can become more complex.
Difference: . If you know the distribution of , you can use convolution.
Product/Ratio: These often require the method of transformations (discussed next) or using cumulative distribution functions ( and then differentiating to find the PDF ).
For many complex functions or when analytical derivation is intractable, simulation becomes a powerful tool to approximate the resulting distribution.
Introduction to Multivariate Transformations¶
Suppose we have a pair of random variables with a known joint PDF . We define two new random variables and . How do we find the joint PDF of , denoted ?
This requires a technique analogous to the change of variables in multivariable calculus, using the Jacobian of the transformation.
Solve for Original Variables: Express and in terms of and : and .
Calculate the Jacobian Determinant: The Jacobian determinant is:
Apply the Transformation Formula: The joint PDF of is:
where is the absolute value of the Jacobian determinant. This formula is valid provided the transformation is one-to-one over the region of interest.
Example: Cartesian to Polar Coordinates. Let have a joint PDF . Consider the transformation to polar coordinates: and . We want to find the joint PDF . The inverse transformation is and . The Jacobian determinant is:
Assuming , . Thus:
If independently, then . Substituting , we get . So, . We can see this separates into a function of and , indicating and are independent. Integrating over from 0 to gives the marginal PDF for : for (Rayleigh distribution), and integrating over gives the marginal PDF for : for (Uniform distribution).
Order Statistics¶
Suppose we have a sample of independent and identically distributed (i.i.d.) random variables . If we arrange these variables in ascending order, we get the order statistics: , where and .
We are often interested in the distribution of these order statistics, particularly the minimum () and the maximum ().
Let the common CDF and PDF of the be and , respectively.
Distribution of the Maximum, : The event means that all of the must be less than or equal to . Since they are i.i.d.:
The PDF is found by differentiating the CDF:
Distribution of the Minimum, : The event means that all of the must be greater than .
Therefore, the CDF is:
The PDF is found by differentiating:
Example: Let be i.i.d. . Then for . The CDF of the minimum is . This is the CDF of an distribution. So, the minimum of i.i.d. exponential random variables is also exponential, with a rate times the original rate.
Hands-on: Simulations and Comparisons¶
Simulating the Sum of Two Independent Uniform Random Variables¶
We expect the sum of two independent variables to follow a triangular distribution on , with PDF:
Let’s simulate this and compare the histogram of the simulated sums to the theoretical PDF.
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
# --- Simulation Parameters ---
num_simulations = 100000
# --- Simulate Uniform Random Variables ---
# Generate pairs of independent Uniform(0, 1) variables
X = np.random.rand(num_simulations)
Y = np.random.rand(num_simulations)
# --- Calculate the Sum ---
Z = X + Y
# --- Define the Theoretical PDF ---
def triangular_pdf(z):
if 0 <= z <= 1:
return z
elif 1 < z <= 2:
return 2 - z
else:
return 0
# Vectorize the function for plotting
v_triangular_pdf = np.vectorize(triangular_pdf)
# --- Plotting ---
plt.figure(figsize=(10, 6))
# Plot histogram of simulated sums
plt.hist(Z, bins=50, density=True, alpha=0.7, label=f'Simulated Sums (n={num_simulations})')
# Plot theoretical PDF
z_values = np.linspace(0, 2, 400)
pdf_values = v_triangular_pdf(z_values)
plt.plot(z_values, pdf_values, 'r-', lw=2, label='Theoretical Triangular PDF')
plt.title('Sum of Two Independent Uniform(0, 1) Variables')
plt.xlabel('Z = X + Y')
plt.ylabel('Density')
plt.legend()
plt.grid(True, linestyle='--', alpha=0.6)
plt.show()
Simulating Order Statistics: Minimum of Exponential Variables¶
Let’s simulate the minimum of independent random variables. We derived theoretically that . Let’s verify this visually.
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
# --- Simulation Parameters ---
num_simulations = 100000
n_variables = 5 # Number of exponential variables
lambda_rate = 1.0 # Rate parameter for individual variables
# --- Simulate Exponential Random Variables ---
# Generate n_variables sets of exponential random variables
# Each row is a simulation, each column is one X_i
exp_samples = np.random.exponential(scale=1.0/lambda_rate, size=(num_simulations, n_variables))
# --- Calculate the Minimum for each simulation ---
X_min = np.min(exp_samples, axis=1)
# --- Theoretical Distribution ---
# The minimum follows Exponential(n * lambda)
theoretical_rate = n_variables * lambda_rate
theoretical_dist = stats.expon(scale=1.0/theoretical_rate)
# --- Plotting ---
plt.figure(figsize=(10, 6))
# Plot histogram of simulated minimums
plt.hist(X_min, bins=50, density=True, alpha=0.7, label=f'Simulated Minima (n={n_variables}, $\lambda$={lambda_rate})')
# Plot theoretical PDF
x_values = np.linspace(X_min.min(), X_min.max(), 400)
pdf_values = theoretical_dist.pdf(x_values)
plt.plot(x_values, pdf_values, 'r-', lw=2, label=f'Theoretical Exponential PDF (rate={theoretical_rate:.1f})')
plt.title(f'Distribution of the Minimum of {n_variables} i.i.d. Exponential({lambda_rate}) Variables')
plt.xlabel('Value of Minimum ($X_{(1)}$)')
plt.ylabel('Density')
plt.legend()
plt.grid(True, linestyle='--', alpha=0.6)
plt.show()
Summary¶
This chapter introduced methods for finding the distribution of functions of multiple random variables. We specifically looked at:
Sums of independent variables: Using convolution (discrete and continuous cases). We saw important results like the sum of independent Poissons being Poisson and the sum of independent Normals being Normal.
Multivariate Transformations: Using the Jacobian determinant to find the joint PDF of transformed variables, illustrated with the Cartesian-to-Polar transformation.
Order Statistics: Deriving the distributions (CDFs and PDFs) for the minimum () and maximum () of an i.i.d. sample.
We used simulations to empirically verify theoretical results, such as the triangular distribution arising from the sum of two uniforms and the exponential distribution arising from the minimum of exponentials. Simulation is a crucial tool when analytical derivations become too complex or intractable.
Exercises¶
Sum of Two Poissons: Let and be independent. a. What is the distribution of ? b. Calculate . c. Simulate and many times, calculate their sum , and create a histogram of the simulated values. Compare the histogram to the theoretical PMF from part (a).
Maximum of Uniforms: Let be i.i.d. . a. Find the theoretical CDF and PDF of . b. Simulate many times, find the maximum in each simulation, and create a histogram. Compare it to the theoretical PDF from part (a).
Ratio of Normals (Cauchy Distribution): Let and be independent. Simulate and many times and compute the ratio . Plot a histogram of the values. What distribution does this resemble? (Note: The theoretical distribution is the Cauchy distribution, which has unusual properties like an undefined mean.)