Appendix E: Summary of Formulas#
This appendix provides a summary of the key formulas introduced in Chapters 1-8.
Chapter 2: The Language of Probability: Sets, Sample Spaces, and Events#
Axioms of Probability#
Let S be a sample space, and P(A) denote the probability of an event A.
Non-negativity: For any event A, the probability of A is greater than or equal to zero. P(A)≥0
Normalization: The probability of the entire sample space S is equal to 1. P(S)=1
Additivity for Disjoint Events: If A1,A2,A3,… is a sequence of mutually exclusive (disjoint) events (i.e., Ai∩Aj=∅ for all i=j), then the probability of their union is the sum of their individual probabilities. P(A1∪A2∪A3∪…)=P(A1)+P(A2)+P(A3)+…
For a finite number of disjoint events, say A and B: If A∩B=∅, then P(A∪B)=P(A)+P(B)
Probability of Impossible Event: The probability of an impossible event (the empty set, ∅) is 0. P(∅)=0
Basic Probability Rules#
Probability Range: For any event A: 0≤P(A)≤1
Complement Rule: The probability that event A does not occur is 1 minus the probability that it does occur. P(A′)=1−P(A)
Addition Rule (General): For any two events A and B (not necessarily disjoint), the probability that A or B (or both) occurs is: P(A∪B)=P(A)+P(B)−P(A∩B)
Empirical Probability#
The empirical probability of an event A is estimated from simulations:
Pempirical(A)=Total number of trialsNumber of times event A occurred
Chapter 3: Counting Techniques: Permutations and Combinations#
The Multiplication Principle#
If a procedure can be broken down into a sequence of k steps, with n1 ways for the first step, n2 for the second, …, nk for the k-th step, then the total number of ways to perform the entire procedure is:
Total ways=n1×n2×⋯×nk
Permutations (Order Matters)#
Permutations without Repetition: The number of permutations of n distinct objects taken k at a time. P(n,k)=(n−k)!n!
Special Case: Arranging all n distinct objects: P(n,n)=n!
Permutations with Repetition (Multinomial Coefficients): The number of distinct permutations of n objects where there are n1 identical objects of type 1, n2 of type 2, …, nk of type k (such that n1+n2+⋯+nk=n). n1!n2!…nk!n!
Combinations (Order Doesn’t Matter)#
Combinations without Repetition: The number of combinations of n distinct objects taken k at a time (also “n choose k”). C(n,k)=(kn)=k!(n−k)!n!
Relationship to permutations: C(n,k)=k!P(n,k)
Combinations with Repetition: The number of combinations with repetition of n types of objects taken k at a time. (kn+k−1)=k!(n−1)!(n+k−1)!
Probability with Equally Likely Outcomes#
The probability of an event E when all outcomes in the sample space S are equally likely:
P(E)=Total number of possible outcomes in SNumber of outcomes favorable to E=∣S∣∣E∣
Chapter 4: Conditional Probability#
Definition of Conditional Probability#
For any two events A and B from a sample space S, where P(B)>0, the conditional probability of A given B is defined as:
P(A∣B)=P(B)P(A∩B)
The Multiplication Rule for Conditional Probability#
Rearranging the definition of conditional probability gives:
P(A∩B)=P(A∣B)P(B)
Similarly, if P(A)>0:
P(A∩B)=P(B∣A)P(A)
For three events A,B,C:
P(A∩B∩C)=P(C∣A∩B)P(B∣A)P(A)
The Law of Total Probability#
Let B1,B2,…,Bn be a partition of the sample space S. Then, for any event A in S:
P(A)=∑i=1nP(A∣Bi)P(Bi)
Expanded form:
P(A)=P(A∣B1)P(B1)+P(A∣B2)P(B2)+…+P(A∣Bn)P(Bn)
Chapter 5: Bayes’ Theorem and Independence#
Bayes’ Theorem#
Provides a way to “reverse” conditional probabilities. If P(B)>0:
P(A∣B)=P(B)P(B∣A)P(A)
Where P(B) can often be calculated using the Law of Total Probability:
P(B)=P(B∣A)P(A)+P(B∣Ac)P(Ac)
Independence of Events#
Formal Definition: Events A and B are independent if and only if: P(A∩B)=P(A)P(B)
Alternative Definition (using conditional probability):
If P(B)>0, A and B are independent if and only if: P(A∣B)=P(A)
Similarly, if P(A)>0, independence means: P(B∣A)=P(B)
Conditional Independence#
Events A and B are conditionally independent given event C (where P(C)>0) if:
P(A∩B∣C)=P(A∣C)P(B∣C)
Alternative Definition: If P(B∣C)>0, conditional independence means:
P(A∣B∩C)=P(A∣C)
Chapter 6: Discrete Random Variables#
Probability Mass Function (PMF)#
For a discrete random variable X, the PMF pX(x) is:
pX(x)=P(X=x)
Properties of a PMF:
pX(x)≥0 for all possible values x.
∑xpX(x)=1 (sum over all possible values x).
Cumulative Distribution Function (CDF)#
For a random variable X, the CDF FX(x) is:
FX(x)=P(X≤x)
For a discrete random variable X:
FX(x)=∑k≤xpX(k)
Properties of a CDF:
0≤FX(x)≤1 for all x.
If a<b, then FX(a)≤FX(b) (non-decreasing).
limx→−∞FX(x)=0
limx→+∞FX(x)=1
P(X>x)=1−FX(x)
P(a<X≤b)=FX(b)−FX(a) for a<b.
P(X=x)=FX(x)−limy→x−FX(y) (for discrete RV, this is the jump at x).
Expected Value (Mean)#
For a discrete random variable X:
E[X]=μX=∑xx⋅pX(x)
Variance#
For a random variable X with mean μX:
Var(X)=σX2=E[(X−μX)2]
For a discrete random variable X:
Var(X)=∑x(x−μX)2⋅pX(x)
Computational formula for variance:
Var(X)=E[X2]−(E[X])2
Where E[X2] for a discrete random variable is:
E[X2]=∑xx2⋅pX(x)
Standard Deviation#
The positive square root of the variance:
SD(X)=σX=Var(X)
Functions of a Random Variable#
If Y=g(X):
PMF of Y (for discrete X): pY(y)=P(Y=y)=P(g(X)=y)=∑x:g(x)=ypX(x)
Expected Value of Y=g(X) (LOTUS - Law of the Unconscious Statistician): For a discrete random variable X: E[Y]=E[g(X)]=∑xg(x)⋅pX(x)
Chapter 7: Common Discrete Distributions#
Bernoulli Distribution#
Models a single trial with two outcomes (success=1, failure=0).
Parameter: p (probability of success).
PMF: P(X=k)=pk(1−p)1−kfor k∈{0,1}
Alternatively: P(X=k)=⎩⎨⎧p1−p0if k=1if k=0otherwise
Mean: E[X]=p
Variance: Var(X)=p(1−p)
Binomial Distribution#
Models the number of successes in n independent Bernoulli trials.
Parameters: n (number of trials), p (probability of success on each trial).
PMF: P(X=k)=(kn)pk(1−p)n−kfor k=0,1,…,n
Mean: E[X]=np
Variance: Var(X)=np(1−p)
Geometric Distribution#
Models the number of trials (k) needed to get the first success.
Parameter: p (probability of success on each trial).
PMF (for X= trial number of first success): P(X=k)=(1−p)k−1pfor k=1,2,3,…
Mean (trial number of first success): E[X]=p1
Variance (trial number of first success): Var(X)=p21−p
Negative Binomial Distribution#
Models the number of trials (k) needed to achieve r successes.
Parameters: r (target number of successes), p (probability of success on each trial).
PMF (for X= trial number of r-th success): P(X=k)=(r−1k−1)pr(1−p)k−rfor k=r,r+1,r+2,…
Mean (trial number of r-th success): E[X]=pr
Variance (trial number of r-th success): Var(X)=p2r(1−p)
Poisson Distribution#
Models the number of events occurring in a fixed interval of time or space.
Parameter: λ (average number of events in the interval).
PMF: P(X=k)=k!e−λλkfor k=0,1,2,…
Mean: E[X]=λ
Variance: Var(X)=λ
Hypergeometric Distribution#
Models the number of successes in a sample of size n drawn without replacement from a finite population of size N containing1 K successes.
Parameters: N (population size), K (total successes in population), n (sample size).
PMF: P(X=k)=(nN)(kK)(n−kN−K) for k such that max(0,n−(N−K))≤k≤min(n,K).
Mean: E[X]=nNK
Variance: Var(X)=nNK(1−NK)(N−1N−n)
Finite Population Correction Factor: N−1N−n
Chapter 8: Continuous Random Variables#
Probability Density Function (PDF)#
For a continuous random variable X, the PDF fX(x) describes the relative likelihood of X.
Properties of a PDF:
fX(x)≥0 for all x.
∫−∞∞fX(x)dx=1 (total area under curve is 1).
Probability as Area: P(a≤X≤b)=∫abfX(x)dx.
For any specific value c: P(X=c)=∫ccfX(x)dx=0.
Cumulative Distribution Function (CDF)#
For a continuous random variable X, the CDF FX(x) is:
FX(x)=P(X≤x)=∫−∞xfX(t)dt
Properties of a CDF:
FX(x) is non-decreasing.
limx→−∞FX(x)=0.
limx→∞FX(x)=1.
P(a<X≤b)=FX(b)−FX(a).
fX(x)=dxdFX(x) (where the derivative exists).
Expected Value (Mean)#
For a continuous random variable X:
E[X]=μ=∫−∞∞xfX(x)dx
Variance#
For a continuous random variable X with mean μ:
Var(X)=σ2=E[(X−μ)2]=∫−∞∞(x−μ)2fX(x)dx
Computational formula:
Var(X)=E[X2]−(E[X])2
Where E[X2]=∫−∞∞x2fX(x)dx.
Standard Deviation#
The positive square root of the variance:
σ=Var(X)
Percentiles and Quantiles#
The p-th percentile xp is the value such that FX(xp)=P(X≤xp)=p.
The quantile function Q(p) is the inverse of the CDF: Q(p)=FX−1(p)=xp.
Functions of a Continuous Random Variable#
If Y=g(X):
CDF of Y: FY(y)=P(Y≤y)=P(g(X)≤y).
PDF of Y (Change of Variables Formula): If g(x) is monotonic with inverse x=g−1(y), then: fY(y)=fX(g−1(y))dydx
Expected Value of Y=g(X) (LOTUS): E[Y]=E[g(X)]=∫−∞∞g(x)fX(x)dx
Chapter 9: Common Continuous Distributions#
1. Uniform Distribution#
X∼U(a,b)
PDF (Probability Density Function): $\( f(x; a, b) \= \\begin{cases} \\frac{1}{b-a} & \\text{for } a \\le x \\le b \\ 0 & \\text{otherwise} \\end{cases} \)$
CDF (Cumulative Distribution Function): $\( F(x; a, b) \= P(X \\le x) \= \\begin{cases} 0 & \\text{for } x \< a \\ \\frac{x-a}{b-a} & \\text{for } a \\le x \\le b \\ 1 & \\text{for } x \> b \\end{cases} \)$
Expected Value: E[X]=2a+b
Variance: Var(X)=12(b−a)2
2. Exponential Distribution#
T∼Exp(λ)
PDF (Probability Density Function): $\( f(t; \\lambda) \= \\begin{cases} \\lambda e^{-\\lambda t} & \\text{for } t \\ge 0 \\ 0 & \\text{for } t \< 0 \\end{cases} \)$
CDF (Cumulative Distribution Function): $\( F(t; \\lambda) \= P(T \\le t) \= \\begin{cases} 1 \- e^{-\\lambda t} & \\text{for } t \\ge 0 \\ 0 & \\text{for } t \< 0 \\end{cases} \)$
Survival Function: P(T>t)=1−F(t)=e−λt
Expected Value: E[T]=λ1
Variance: Var(T)=λ21
Memoryless Property: P(T>s+t∣T>s)=P(T>t) for any s,t≥0.
3. Normal (Gaussian) Distribution#
X∼N(μ,σ2)
PDF (Probability Density Function): $\( f(x; \\mu, \\sigma^2) \= \\frac{1}{\\sqrt{2\\pi\\sigma^2}} e^{ \- \\frac{(x-\\mu)^2}{2\\sigma^2} } \)$ for −∞<x<∞.
Expected Value: E[X]=μ
Variance: Var(X)=σ2
Standardization (Z-score): Z=σX−μ where Z∼N(0,1).
4. Gamma Distribution#
X∼Gamma(k,λ) (using shape k and rate λ) or X∼Gamma(k,θ) (using shape k and scale θ=1/λ)
The Gamma function is Γ(k)=∫0∞xk−1e−xdx. For positive integers k, Γ(k)=(k−1)!.
PDF (Probability Density Function): Using shape k and rate λ: $\( f(x; k, \\lambda) \= \\frac{\\lambda^k x^{k-1} e^{-\\lambda x}}{\\Gamma(k)} \\quad \\text{for } x \\ge 0 \)\( Using shape k and scale θ=1/λ: \)\( f(x; k, \\theta) \= \\frac{1}{\\Gamma(k)\\theta^k} x^{k-1} e^{-x/\\theta} \\quad \\text{for } x \\ge 0 \)$
Expected Value: E[X]=λk=kθ
Variance: Var(X)=λ2k=kθ2
5. Beta Distribution#
X∼Beta(α,β)
The Beta function is B(α,β)=∫01tα−1(1−t)β−1dt=Γ(α+β)Γ(α)Γ(β).
PDF (Probability Density Function): $\( f(x; \\alpha, \\beta) \= \\frac{1}{B(\\alpha, \\beta)} x^{\\alpha-1} (1-x)^{\\beta-1} \= \\frac{\\Gamma(\\alpha+\\beta)}{\\Gamma(\\alpha)\\Gamma(\\beta)} x^{\\alpha-1} (1-x)^{\\beta-1} \)$ for 0≤x≤1.
Expected Value: E[X]=α+βα
Variance: Var(X)=(α+β)2(α+β+1)αβ
Chapter 10: Joint Distributions#
Joint Probability Mass Functions (PMFs)#
For two discrete random variables \(X\) and \(Y\):
Joint PMF Definition: $\(p_{X,Y}(x, y) = P(X=x, Y=y)\)$
Conditions:
\(p_{X,Y}(x, y) \ge 0\) for all \((x, y)\)
\(\sum_{x} \sum_{y} p_{X,Y}(x, y) = 1\)
Joint Probability Density Functions (PDFs)#
For two continuous random variables \(X\) and \(Y\):
Probability over a Region A: $\(P((X, Y) \in A) = \iint_A f_{X,Y}(x, y) \,dx \,dy\)$
Conditions:
\(f_{X,Y}(x, y) \ge 0\) for all \((x, y)\)
\(\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f_{X,Y}(x, y) \,dx \,dy = 1\)
Marginal Distributions#
Marginal PMF of X (Discrete): $\(p_X(x) = P(X=x) = \sum_{y} P(X=x, Y=y) = \sum_{y} p_{X,Y}(x, y)\)$
Marginal PMF of Y (Discrete): $\(p_Y(y) = P(Y=y) = \sum_{x} P(X=x, Y=y) = \sum_{x} p_{X,Y}(x, y)\)$
Marginal PDF of X (Continuous): $\(f_X(x) = \int_{-\infty}^{\infty} f_{X,Y}(x, y) \,dy\)$
Marginal PDF of Y (Continuous): $\(f_Y(y) = \int_{-\infty}^{\infty} f_{X,Y}(x, y) \,dx\)$
Conditional Distributions#
Conditional PMF of Y given X=x (Discrete): $\(p_{Y|X}(y|x) = P(Y=y | X=x) = \frac{P(X=x, Y=y)}{P(X=x)} = \frac{p_{X,Y}(x, y)}{p_X(x)}\)\( (provided \)p_X(x) > 0$)
Conditional PDF of Y given X=x (Continuous): $\(f_{Y|X}(y|x) = \frac{f_{X,Y}(x, y)}{f_X(x)}\)\( (provided \)f_X(x) > 0$)
Joint Cumulative Distribution Functions (CDFs)#
Joint CDF Definition: $\(F_{X,Y}(x, y) = P(X \le x, Y \le y)\)$
Discrete Case: $\(F_{X,Y}(x, y) = \sum_{x_i \le x} \sum_{y_j \le y} p_{X,Y}(x_i, y_j)\)$
Continuous Case: $\(F_{X,Y}(x, y) = \int_{-\infty}^{x} \int_{-\infty}^{y} f_{X,Y}(u, v) \,dv \,du\)$
Properties:
\(0 \le F_{X,Y}(x, y) \le 1\)
\(F_{X,Y}(x, y)\) is non-decreasing in both \(x\) and \(y\).
\(\lim_{x \to \infty, y \to \infty} F_{X,Y}(x, y) = 1\)
\(\lim_{x \to -\infty} F_{X,Y}(x, y) = 0\) and \(\lim_{y \to -\infty} F_{X,Y}(x, y) = 0\)
Chapter 11: Independence, Covariance, and Correlation#
Independence of Random Variables#
Two random variables \(X\) and \(Y\) are independent if for any sets \(A\) and \(B\): $\(P(X \in A, Y \in B) = P(X \in A) P(Y \in B)\)$
This is equivalent to:
Discrete: $\(P(X=x, Y=y) = P(X=x) P(Y=y)\)$ (Joint PMF = Product of Marginal PMFs)
Continuous: $\(f_{X,Y}(x,y) = f_X(x) f_Y(y)\)$ (Joint PDF = Product of Marginal PDFs)
Covariance#
The covariance between two random variables \(X\) and \(Y\):
Definition: $\(\mathrm{Cov}(X, Y) = E[(X - E[X])(Y - E[Y])]\)$
Computational Formula: $\(\mathrm{Cov}(X, Y) = E[XY] - E[X]E[Y]\)$
Properties:
\(\mathrm{Cov}(X, X) = \mathrm{Var}(X)\)
\(\mathrm{Cov}(X, Y) = \mathrm{Cov}(Y, X)\)
\(\mathrm{Cov}(aX + b, cY + d) = ac \mathrm{Cov}(X, Y)\)
\(\mathrm{Cov}(X+Y, Z) = \mathrm{Cov}(X, Z) + \mathrm{Cov}(Y, Z)\)
If \(X\) and \(Y\) are independent, then \(\mathrm{Cov}(X, Y) = 0\).
Correlation Coefficient#
The Pearson correlation coefficient between two random variables \(X\) and \(Y\):
Definition: $\(\rho(X, Y) = \frac{\mathrm{Cov}(X, Y)}{\sigma_X \sigma_Y} = \frac{\mathrm{Cov}(X, Y)}{\sqrt{\mathrm{Var}(X) \mathrm{Var}(Y)}}\)$
Properties:
\(-1 \le \rho(X, Y) \le 1\)
\(\rho(aX + b, cY + d) = \mathrm{sign}(ac) \rho(X, Y)\), (assuming \(a \ne 0, c \ne 0\))
Variance of Sums of Random Variables#
For any two random variables \(X\) and \(Y\), and constants \(a\) and \(b\):
General Formula: $\(\mathrm{Var}(aX + bY) = a^2 \mathrm{Var}(X) + b^2 \mathrm{Var}(Y) + 2ab \mathrm{Cov}(X, Y)\)$
Sum of Variables (\(a=1, b=1\)): $\(\mathrm{Var}(X + Y) = \mathrm{Var}(X) + \mathrm{Var}(Y) + 2 \mathrm{Cov}(X, Y)\)$
Difference of Variables (\(a=1, b=-1\)): $\(\mathrm{Var}(X - Y) = \mathrm{Var}(X) + \mathrm{Var}(Y) - 2 \mathrm{Cov}(X, Y)\)$
If \(X\) and \(Y\) are independent (\(\mathrm{Cov}(X, Y) = 0\)): $\(\mathrm{Var}(aX + bY) = a^2 \mathrm{Var}(X) + b^2 \mathrm{Var}(Y)\)\( \)\(\mathrm{Var}(X + Y) = \mathrm{Var}(X) + \mathrm{Var}(Y)\)\( \)\(\mathrm{Var}(X - Y) = \mathrm{Var}(X) + \mathrm{Var}(Y)\)$
Extension to Multiple Variables (\(X_1, X_2, ..., X_n\)): $\(\mathrm{Var}\left(\sum_{i=1}^n a_i X_i\right) = \sum_{i=1}^n a_i^2 \mathrm{Var}(X_i) + \sum_{i \ne j} a_i a_j \mathrm{Cov}(X_i, X_j)\)\( or \)\(\mathrm{Var}\left(\sum_{i=1}^n a_i X_i\right) = \sum_{i=1}^n a_i^2 \mathrm{Var}(X_i) + 2 \sum_{i < j} a_i a_j \mathrm{Cov}(X_i, X_j)\)$
If all \(X_i\) are independent: $\(\mathrm{Var}\left(\sum_{i=1}^n a_i X_i\right) = \sum_{i=1}^n a_i^2 \mathrm{Var}(X_i)\)$
Chapter 12: Functions of Multiple Random Variables#
Sums of Independent Random Variables (Convolution)#
Let \(X\) and \(Y\) be two random variables, and \(Z = X+Y\).
Discrete Case (PMF of Z): $\(P(Z=z) = \sum_{k} P(X=k, Y=z-k)\)\( If \)X\( and \)Y\( are independent: \)\(P(Z=z) = \sum_{k} P(X=k)P(Y=z-k)\)$ This is the discrete convolution of the PMFs.
Continuous Case (PDF of Z): $\(f_Z(z) = \int_{-\infty}^{\infty} f_{X,Y}(x, z-x)dx\)\( If \)X\( and \)Y\( are independent: \)\(f_Z(z) = \int_{-\infty}^{\infty} f_X(x)f_Y(z-x)dx = (f_X * f_Y)(z)\)$ This is the convolution of the PDFs.
General Transformations (Jacobian Method for PDFs)#
If \(Y_1 = g_1(X_1, X_2)\) and \(Y_2 = g_2(X_1, X_2)\) are transformations of random variables \(X_1, X_2\), and these transformations are invertible such that \(X_1 = h_1(Y_1, Y_2)\) and \(X_2 = h_2(Y_1, Y_2)\).
Joint PDF of \(Y_1, Y_2\): $\(f_{Y_1,Y_2}(y_1, y_2) = f_{X_1,X_2}(h_1(y_1,y_2), h_2(y_1,y_2)) |J|\)\( Where \)|J|$ is the absolute value of the determinant of the Jacobian matrix.
Jacobian Determinant (J): $\( J = \det \begin{pmatrix} \frac{\partial x_1}{\partial y_1} & \frac{\partial x_1}{\partial y_2} \\ \frac{\partial x_2}{\partial y_1} & \frac{\partial x_2}{\partial y_2} \end{pmatrix} \)$
Order Statistics#
Let \(X_1, X_2, \dots, X_n\) be \(n\) independent and identically distributed (i.i.d.) random variables with CDF \(F_X(x)\) and PDF \(f_X(x)\). Let \(X_{(1)}, X_{(2)}, \dots, X_{(n)}\) be the order statistics (sorted values).
CDF of the Maximum (\(Y_n = X_{(n)}\)): $\(F_{Y_n}(y) = P(X_{(n)} \le y) = [F_X(y)]^n\)$
PDF of the Maximum (\(Y_n = X_{(n)}\)): $\(f_{Y_n}(y) = n[F_X(y)]^{n-1}f_X(y)\)$
CDF of the Minimum (\(Y_1 = X_{(1)}\)): $\(F_{Y_1}(y) = P(X_{(1)} \le y) = 1 - [1-F_X(y)]^n\)$
PDF of the Minimum (\(Y_1 = X_{(1)}\)): $\(f_{Y_1}(y) = n[1-F_X(y)]^{n-1}f_X(y)\)$
PDF of the \(k\)-th Order Statistic (\(Y_k = X_{(k)}\)): $\(f_{Y_k}(y) = \frac{n!}{(k-1)!(n-k)!} [F_X(y)]^{k-1} [1-F_X(y)]^{n-k} f_X(y)\)$
Chapter 13: The Law of Large Numbers (LLN)#
Chebyshev’s Inequality#
For a random variable \(X\) with mean \(\mu\) and finite variance \(\sigma^2\):
Form 1: $\(P(|X - \mu| \ge k\sigma) \le \frac{1}{k^2}\)\( (where \)k$ is the number of standard deviations)
Form 2: $\(P(|X - \mu| \ge \epsilon) \le \frac{\sigma^2}{\epsilon^2}\)\( (where \)\epsilon > 0$ is any positive number)
Weak Law of Large Numbers (WLLN)#
For a sequence of i.i.d. random variables \(X_1, X_2, \dots, X_n\) with common mean \(E[X_i] = \mu\) and common finite variance \(Var(X_i) = \sigma^2\). Let \(\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i\) be the sample mean.
Statement: For any \(\epsilon > 0\), $\(\lim_{n \to \infty} P(|\bar{X}_n - \mu| \ge \epsilon) = 0\)\( or equivalently, \)\(\lim_{n \to \infty} P(|\bar{X}_n - \mu| < \epsilon) = 1\)$
Formulas used in WLLN proof via Chebyshev’s Inequality:
Expected Value of Sample Mean: $\(E[\bar{X}_n] = \mu\)$
Variance of Sample Mean (for i.i.d. variables): $\(Var(\bar{X}_n) = \frac{\sigma^2}{n}\)$
Application of Chebyshev’s Inequality to \(\bar{X}_n\): $\(P(|\bar{X}_n - \mu| \ge \epsilon) \le \frac{Var(\bar{X}_n)}{\epsilon^2} = \frac{\sigma^2}{n\epsilon^2}\)$
Strong Law of Large Numbers (SLLN)#
For a sequence of i.i.d. random variables \(X_1, X_2, \dots, X_n\) with common mean \(E[X_i] = \mu\).
Statement: $\(P\left(\lim_{n \to \infty} \bar{X}_n = \mu\right) = 1\)$ (The sample mean converges almost surely to the population mean).
Chapter 14: The Central Limit Theorem (CLT)#
Statement of CLT (Lindeberg-Lévy CLT): Let \(X_1, X_2, \dots, X_n\) be i.i.d. random variables with mean \(\mu\) and variance \(\sigma^2\). Let \(\bar{X}_n\) be the sample mean. $\(Z_n = \frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \xrightarrow{d} N(0, 1)\)\( (where \)\xrightarrow{d}$ denotes convergence in distribution)
Convergence in Distribution (for \(Z_n\)): $\(\lim_{n \to \infty} P(Z_n \le z) = \Phi(z)\)\( where \)\Phi(z)\( is the CDF of the standard Normal distribution \)N(0, 1)$.
Approximation for Sample Mean \(\bar{X}_n\): $\(P(\bar{X}_n \le x) \approx \Phi\left(\frac{x - \mu}{\sigma/\sqrt{n}}\right)\)$
CLT for Sums (\(S_n = \sum_{i=1}^{n} X_i\)): \(E[S_n] = n\mu\), \(Var(S_n) = n\sigma^2\). $\(\frac{S_n - n\mu}{\sqrt{n}\sigma} \xrightarrow{d} N(0, 1)\)$
Normal Approximation to Binomial Distribution: For \(X \sim \text{Binomial}(n, p)\):
Mean: \(E[X] = np\)
Variance: \(Var(X) = np(1-p)\)
Approximation: \(X \approx N(np, np(1-p))\) (if \(np \ge 5\) and \(n(1-p) \ge 5\) is a common rule of thumb).
Continuity Correction (for approximating discrete with continuous):
To approximate \(P(X \le k)\), use \(P(Y \le k + 0.5)\)
To approximate \(P(X \ge k)\), use \(P(Y \ge k - 0.5)\)
To approximate \(P(X = k)\), use \(P(k - 0.5 \le Y \le k + 0.5)\)
To approximate \(P(a \le X \le b)\), use \(P(a - 0.5 \le Y \le b + 0.5)\)
Chapter 15: Introduction to Bayesian Inference#
Bayes’ Theorem for Distributions: $\(p(\theta | D) = \frac{p(D | \theta) p(\theta)}{p(D)}\)$ Where:
\(p(\theta | D)\) is the posterior probability of parameter \(\theta\) given data \(D\).
\(p(D | \theta)\) is the likelihood of data \(D\) given parameter \(\theta\).
\(p(\theta)\) is the prior probability of parameter \(\theta\).
\(p(D)\) is the evidence (or marginal likelihood of data).
Proportionality Form: $\(\text{Posterior} \propto \text{Likelihood} \times \text{Prior}\)$
Evidence Calculation:
For continuous \(\theta\): \(p(D) = \int p(D|\theta) p(\theta) d\theta\)
For discrete \(\theta\): \(p(D) = \sum_{\theta} p(D|\theta) p(\theta)\)
Beta-Binomial Conjugate Prior Update: If prior is \(\text{Beta}(\alpha_{prior}, \beta_{prior})\) and data is \(k\) successes in \(n\) trials (Binomial likelihood):
Posterior is \(\text{Beta}(\alpha_{posterior}, \beta_{posterior}) = \text{Beta}(\alpha_{prior} + k, \beta_{prior} + n - k)\)
Point Estimates from Posterior:
Maximum a Posteriori (MAP) Estimate: $\(\hat{\theta}_{MAP} = \arg \max_{\theta} p(\theta | D)\)\( For a Beta\)(\alpha, \beta)\( posterior (if \)\alpha > 1, \beta > 1\(): \)\(\hat{\theta}_{MAP} = \frac{\alpha - 1}{\alpha + \beta - 2}\)$
Posterior Mean: $\(\hat{\theta}_{Mean} = E[\theta | D] = \int \theta p(\theta | D) d\theta\)\( For a Beta\)(\alpha, \beta)\( posterior: \)\(\hat{\theta}_{Mean} = \frac{\alpha}{\alpha + \beta}\)$
Credible Interval: An interval \([L, U]\) such that: $\(P(L \le \theta \le U | D) = \int_L^U p(\theta | D) d\theta = 1 - \gamma\)\( (where \)1-\gamma$ is the credibility level, e.g., 95%)
Chapter 16: Introduction to Markov Chains#
Transition Probability (from state \(i\) to state \(j\)): $\(P_{ij} = P(X_{t+1} = s_j | X_t = s_i)\)$
n-Step Transition Probability: The \((i, j)\)-th entry of the matrix \(P^n\) (the transition matrix \(P\) raised to the power of \(n\)): $\(P^{(n)}_{ij} = P(X_{t+n} = s_j | X_t = s_i) = (P^n)_{ij}\)$
Stationary Distribution (\(\pi\)): A row vector \(\pi = [\pi_1, \pi_2, ..., \pi_k]\) such that: $\(\pi P = \pi\)\( and \)\(\sum_{j=1}^{k} \pi_j = 1\)$
Chapter 17: Monte Carlo Methods#
Estimating Probability \(P(A)\): $\(P(A) \approx \frac{N_A}{N}\)\( (where \)N_A\( is the number of times event A occurred in \)N$ simulations)
Estimating Expected Value \(E[g(X)]\): $\(E[g(X)] \approx \frac{1}{N} \sum_{i=1}^{N} g(X_i)\)\( (where \)X_i\( are samples from the distribution of \)X$)
Monte Carlo Integration (Hit-or-Miss for Area): $\(\text{Area}(A) \approx \text{Area}(B) \times \frac{N_{hit}}{N}\)$
Monte Carlo Integration (Using Expected Values for \(I = \int_a^b g(x) dx\)): If \(X \sim \text{Uniform}(a, b)\): $\(I \approx (b-a) \times \frac{1}{N} \sum_{i=1}^{N} g(X_i)\)$
Inverse Transform Method for Generating Random Variables: If \(U \sim \text{Uniform}(0, 1)\), then \(X = F^{-1}(U)\) has CDF \(F(x)\).
For Exponential(\(\lambda\)): \(F^{-1}(u) = -\frac{1}{\lambda} \ln(1 - u)\) or \(F^{-1}(u) = -\frac{1}{\lambda} \ln(u)\).
Acceptance-Rejection Method: To sample from target PDF \(f(x)\) using proposal PDF \(g(x)\) where \(f(x) \le c \cdot g(x)\):
Sample \(y\) from \(g(x)\).
Sample \(u\) from \(\text{Uniform}(0, 1)\).
Accept \(y\) if \(u \le \frac{f(y)}{c \cdot g(y)}\).
Buffon’s Needle Problem (\(L \le D\)): Probability of needle crossing a line: $\(P(\text{cross}) = \frac{2L}{\pi D}\)\( For \)L=1, D=2\(: \)\(\pi \approx \frac{1}{P(\text{cross})}\)$
Chapter 18: (Optional) Further Explorations#
Entropy \(H(X)\) (for discrete random variable \(X\) with PMF \(p(x)\)): $\(H(X) = - \sum_{x} p(x) \log_b p(x)\)\( (Base \)b\( is often 2 for bits, or \)e$ for nats)
Kullback-Leibler (KL) Divergence (for discrete distributions \(P\) and \(Q\)): $\(D_{KL}(P || Q) = \sum_{x} P(x) \log_b \frac{P(x)}{Q(x)}\)$
Geometric Brownian Motion (GBM) \(S(t)\):
Stochastic Differential Equation: \(dS(t) = \mu S(t) dt + \sigma S(t) dW(t)\)
Solution: \(S(t) = S(0) \exp\left( \left(\mu - \frac{\sigma^2}{2}\right)t + \sigma W(t) \right)\)
Probability Generating Function (PGF) \(G_X(z)\) (for non-negative integer-valued RV \(X\)): $\(G_X(z) = E[z^X] = \sum_{k=0}^{\infty} P(X=k) z^k\)$
\(E[X] = G'_X(1)\)
\(Var(X) = G''_X(1) + G'_X(1) - [G'_X(1)]^2\)
Moment Generating Function (MGF) \(M_X(t)\): $\(M_X(t) = E[e^{tX}]\)$
\(E[X^n] = M_X^{(n)}(0)\) (n-th derivative evaluated at \(t=0\))
For independent \(X, Y\): \(M_{X+Y}(t) = M_X(t) M_Y(t)\)