Appendix E: Summary of Formulas

Appendix E: Summary of Formulas#

This appendix provides a summary of the key formulas introduced in Chapters 1-8.

Chapter 2: The Language of Probability: Sets, Sample Spaces, and Events#

Axioms of Probability#

Let S be a sample space, and P(A) denote the probability of an event A.

Non-negativity: For any event A, the probability of A is greater than or equal to zero. P(A)≥0
Normalization: The probability of the entire sample space S is equal to 1. P(S)=1
Additivity for Disjoint Events: If A1,A2,A3,… is a sequence of mutually exclusive (disjoint) events (i.e., Ai∩Aj=∅ for all i=j), then the probability of their union is the sum of their individual probabilities. P(A1∪A2∪A3∪…)=P(A1)+P(A2)+P(A3)+…
- For a finite number of disjoint events, say A and B: If A∩B=∅, then P(A∪B)=P(A)+P(B)

Probability of Impossible Event: The probability of an impossible event (the empty set, ∅) is 0. P(∅)=0

Basic Probability Rules#

Probability Range: For any event A: 0≤P(A)≤1
Complement Rule: The probability that event A does not occur is 1 minus the probability that it does occur. P(A′)=1−P(A)
Addition Rule (General): For any two events A and B (not necessarily disjoint), the probability that A or B (or both) occurs is: P(A∪B)=P(A)+P(B)−P(A∩B)

Empirical Probability#

The empirical probability of an event A is estimated from simulations:
Pempirical(A)=Total number of trialsNumber of times event A occurred

Chapter 3: Counting Techniques: Permutations and Combinations#

The Multiplication Principle#

If a procedure can be broken down into a sequence of k steps, with n1 ways for the first step, n2 for the second, …, nk for the k-th step, then the total number of ways to perform the entire procedure is:
Total ways=n1×n2×⋯×nk

Permutations (Order Matters)#

Permutations without Repetition: The number of permutations of n distinct objects taken k at a time. P(n,k)=(n−k)!n!
- Special Case: Arranging all n distinct objects: P(n,n)=n!
Permutations with Repetition (Multinomial Coefficients): The number of distinct permutations of n objects where there are n1 identical objects of type 1, n2 of type 2, …, nk of type k (such that n1+n2+⋯+nk=n). n1!n2!…nk!n!

Combinations (Order Doesn’t Matter)#

Combinations without Repetition: The number of combinations of n distinct objects taken k at a time (also “n choose k”). C(n,k)=(kn)=k!(n−k)!n!
- Relationship to permutations: C(n,k)=k!P(n,k)
Combinations with Repetition: The number of combinations with repetition of n types of objects taken k at a time. (kn+k−1)=k!(n−1)!(n+k−1)!

Probability with Equally Likely Outcomes#

The probability of an event E when all outcomes in the sample space S are equally likely:
P(E)=Total number of possible outcomes in SNumber of outcomes favorable to E=∣S∣∣E∣

Chapter 4: Conditional Probability#

Definition of Conditional Probability#

For any two events A and B from a sample space S, where P(B)>0, the conditional probability of A given B is defined as:
P(A∣B)=P(B)P(A∩B)

The Multiplication Rule for Conditional Probability#

Rearranging the definition of conditional probability gives:
P(A∩B)=P(A∣B)P(B)
Similarly, if P(A)>0:
P(A∩B)=P(B∣A)P(A)
For three events A,B,C:
P(A∩B∩C)=P(C∣A∩B)P(B∣A)P(A)

The Law of Total Probability#

Let B1,B2,…,Bn be a partition of the sample space S. Then, for any event A in S:
P(A)=∑i=1nP(A∣Bi)P(Bi)
Expanded form:
P(A)=P(A∣B1)P(B1)+P(A∣B2)P(B2)+…+P(A∣Bn)P(Bn)

Chapter 5: Bayes’ Theorem and Independence#

Bayes’ Theorem#

Provides a way to “reverse” conditional probabilities. If P(B)>0:
P(A∣B)=P(B)P(B∣A)P(A)
Where P(B) can often be calculated using the Law of Total Probability:
P(B)=P(B∣A)P(A)+P(B∣Ac)P(Ac)

Independence of Events#

Formal Definition: Events A and B are independent if and only if: P(A∩B)=P(A)P(B)
Alternative Definition (using conditional probability):
- If P(B)>0, A and B are independent if and only if: P(A∣B)=P(A)
- Similarly, if P(A)>0, independence means: P(B∣A)=P(B)

Conditional Independence#

Events A and B are conditionally independent given event C (where P(C)>0) if:
P(A∩B∣C)=P(A∣C)P(B∣C)
Alternative Definition: If P(B∣C)>0, conditional independence means:
P(A∣B∩C)=P(A∣C)

Chapter 6: Discrete Random Variables#

Probability Mass Function (PMF)#

For a discrete random variable X, the PMF pX(x) is:
pX(x)=P(X=x)
Properties of a PMF:

pX(x)≥0 for all possible values x.
∑xpX(x)=1 (sum over all possible values x).

Cumulative Distribution Function (CDF)#

For a random variable X, the CDF FX(x) is:
FX(x)=P(X≤x)
For a discrete random variable X:
FX(x)=∑k≤xpX(k)
Properties of a CDF:

0≤FX(x)≤1 for all x.
If a<b, then FX(a)≤FX(b) (non-decreasing).
limx→−∞FX(x)=0
limx→+∞FX(x)=1
P(X>x)=1−FX(x)
P(a<X≤b)=FX(b)−FX(a) for a<b.
P(X=x)=FX(x)−limy→x−FX(y) (for discrete RV, this is the jump at x).

Expected Value (Mean)#

For a discrete random variable X:
E[X]=μX=∑xx⋅pX(x)

Variance#

For a random variable X with mean μX:
Var(X)=σX2=E[(X−μX)2]
For a discrete random variable X:
Var(X)=∑x(x−μX)2⋅pX(x)
Computational formula for variance:
Var(X)=E[X2]−(E[X])2
Where E[X2] for a discrete random variable is:
E[X2]=∑xx2⋅pX(x)

Standard Deviation#

The positive square root of the variance:
SD(X)=σX=Var(X)

Functions of a Random Variable#

If Y=g(X):

PMF of Y (for discrete X): pY(y)=P(Y=y)=P(g(X)=y)=∑x:g(x)=ypX(x)
Expected Value of Y=g(X) (LOTUS - Law of the Unconscious Statistician): For a discrete random variable X: E[Y]=E[g(X)]=∑xg(x)⋅pX(x)

Chapter 7: Common Discrete Distributions#

Bernoulli Distribution#

Models a single trial with two outcomes (success=1, failure=0).
Parameter: p (probability of success).

PMF: P(X=k)=pk(1−p)1−kfor k∈{0,1}
- Alternatively: P(X=k)=⎩⎨⎧p1−p0if k=1if k=0otherwise
Mean: E[X]=p
Variance: Var(X)=p(1−p)

Binomial Distribution#

Models the number of successes in n independent Bernoulli trials.
Parameters: n (number of trials), p (probability of success on each trial).

PMF: P(X=k)=(kn)pk(1−p)n−kfor k=0,1,…,n
Mean: E[X]=np
Variance: Var(X)=np(1−p)

Geometric Distribution#

Models the number of trials (k) needed to get the first success.
Parameter: p (probability of success on each trial).

PMF (for X= trial number of first success): P(X=k)=(1−p)k−1pfor k=1,2,3,…
Mean (trial number of first success): E[X]=p1
Variance (trial number of first success): Var(X)=p21−p

Negative Binomial Distribution#

Models the number of trials (k) needed to achieve r successes.
Parameters: r (target number of successes), p (probability of success on each trial).

PMF (for X= trial number of r-th success): P(X=k)=(r−1k−1)pr(1−p)k−rfor k=r,r+1,r+2,…
Mean (trial number of r-th success): E[X]=pr
Variance (trial number of r-th success): Var(X)=p2r(1−p)

Poisson Distribution#

Models the number of events occurring in a fixed interval of time or space.
Parameter: λ (average number of events in the interval).

PMF: P(X=k)=k!e−λλkfor k=0,1,2,…
Mean: E[X]=λ
Variance: Var(X)=λ

Hypergeometric Distribution#

Models the number of successes in a sample of size n drawn without replacement from a finite population of size N containing1 K successes.
Parameters: N (population size), K (total successes in population), n (sample size).

PMF: P(X=k)=(nN)(kK)(n−kN−K) for k such that max(0,n−(N−K))≤k≤min(n,K).
Mean: E[X]=nNK
Variance: Var(X)=nNK(1−NK)(N−1N−n)
- Finite Population Correction Factor: N−1N−n

Chapter 8: Continuous Random Variables#

Probability Density Function (PDF)#

For a continuous random variable X, the PDF fX(x) describes the relative likelihood of X.
Properties of a PDF:

fX(x)≥0 for all x.
∫−∞∞fX(x)dx=1 (total area under curve is 1).
Probability as Area: P(a≤X≤b)=∫abfX(x)dx.
For any specific value c: P(X=c)=∫ccfX(x)dx=0.

Cumulative Distribution Function (CDF)#

For a continuous random variable X, the CDF FX(x) is:
FX(x)=P(X≤x)=∫−∞xfX(t)dt
Properties of a CDF:

FX(x) is non-decreasing.
limx→−∞FX(x)=0.
limx→∞FX(x)=1.
P(a<X≤b)=FX(b)−FX(a).
fX(x)=dxdFX(x) (where the derivative exists).

Expected Value (Mean)#

For a continuous random variable X:
E[X]=μ=∫−∞∞xfX(x)dx

Variance#

For a continuous random variable X with mean μ:
Var(X)=σ2=E[(X−μ)2]=∫−∞∞(x−μ)2fX(x)dx
Computational formula:
Var(X)=E[X2]−(E[X])2
Where E[X2]=∫−∞∞x2fX(x)dx.

Standard Deviation#

The positive square root of the variance:
σ=Var(X)

Percentiles and Quantiles#

The p-th percentile xp is the value such that FX(xp)=P(X≤xp)=p.
The quantile function Q(p) is the inverse of the CDF: Q(p)=FX−1(p)=xp.

Functions of a Continuous Random Variable#

If Y=g(X):

CDF of Y: FY(y)=P(Y≤y)=P(g(X)≤y).
PDF of Y (Change of Variables Formula): If g(x) is monotonic with inverse x=g−1(y), then: fY(y)=fX(g−1(y))dydx
Expected Value of Y=g(X) (LOTUS): E[Y]=E[g(X)]=∫−∞∞g(x)fX(x)dx

Chapter 9: Common Continuous Distributions#

1. Uniform Distribution#

X∼U(a,b)

PDF (Probability Density Function): $$ f(x; a, b) \= \\begin{cases} \\frac{1}{b-a} & \\text{for } a \\le x \\le b \\ 0 & \\text{otherwise} \\end{cases} $$
CDF (Cumulative Distribution Function): $$ F(x; a, b) \= P(X \\le x) \= \\begin{cases} 0 & \\text{for } x \< a \\ \\frac{x-a}{b-a} & \\text{for } a \\le x \\le b \\ 1 & \\text{for } x \> b \\end{cases} $$
Expected Value: E[X]=2a+b
Variance: Var(X)=12(b−a)2

2. Exponential Distribution#

T∼Exp(λ)

PDF (Probability Density Function): $$ f(t; \\lambda) \= \\begin{cases} \\lambda e^{-\\lambda t} & \\text{for } t \\ge 0 \\ 0 & \\text{for } t \< 0 \\end{cases} $$
CDF (Cumulative Distribution Function): $$ F(t; \\lambda) \= P(T \\le t) \= \\begin{cases} 1 \- e^{-\\lambda t} & \\text{for } t \\ge 0 \\ 0 & \\text{for } t \< 0 \\end{cases} $$
Survival Function: P(T>t)=1−F(t)=e−λt
Expected Value: E[T]=λ1
Variance: Var(T)=λ21
Memoryless Property: P(T>s+t∣T>s)=P(T>t) for any s,t≥0.

3. Normal (Gaussian) Distribution#

X∼N(μ,σ2)

PDF (Probability Density Function): $$ f(x; \\mu, \\sigma^2) \= \\frac{1}{\\sqrt{2\\pi\\sigma^2}} e^{ \- \\frac{(x-\\mu)^2}{2\\sigma^2} } $$ for −∞<x<∞.
Expected Value: E[X]=μ
Variance: Var(X)=σ2
Standardization (Z-score): Z=σX−μ where Z∼N(0,1).

4. Gamma Distribution#

X∼Gamma(k,λ) (using shape k and rate λ) or X∼Gamma(k,θ) (using shape k and scale θ=1/λ)
The Gamma function is Γ(k)=∫0∞xk−1e−xdx. For positive integers k, Γ(k)=(k−1)!.

PDF (Probability Density Function): Using shape k and rate λ: $$ f(x; k, \\lambda) \= \\frac{\\lambda^k x^{k-1} e^{-\\lambda x}}{\\Gamma(k)} \\quad \\text{for } x \\ge 0 $$ Using shape k and scale θ=1/λ: $$ f(x; k, \\theta) \= \\frac{1}{\\Gamma(k)\\theta^k} x^{k-1} e^{-x/\\theta} \\quad \\text{for } x \\ge 0 $$
Expected Value: E[X]=λk=kθ
Variance: Var(X)=λ2k=kθ2

5. Beta Distribution#

X∼Beta(α,β)
The Beta function is B(α,β)=∫01tα−1(1−t)β−1dt=Γ(α+β)Γ(α)Γ(β).

PDF (Probability Density Function): $$ f(x; \\alpha, \\beta) \= \\frac{1}{B(\\alpha, \\beta)} x^{\\alpha-1} (1-x)^{\\beta-1} \= \\frac{\\Gamma(\\alpha+\\beta)}{\\Gamma(\\alpha)\\Gamma(\\beta)} x^{\\alpha-1} (1-x)^{\\beta-1} $$ for 0≤x≤1.
Expected Value: E[X]=α+βα
Variance: Var(X)=(α+β)2(α+β+1)αβ

Chapter 10: Joint Distributions#

Joint Probability Mass Functions (PMFs)#

For two discrete random variables $X$ and $Y$:

Joint PMF Definition: $$p_{X,Y}(x, y) = P(X=x, Y=y)$$
Conditions:
1. $p_{X,Y}(x, y) \ge 0$ for all $(x, y)$
2. $\sum_{x} \sum_{y} p_{X,Y}(x, y) = 1$

Joint Probability Density Functions (PDFs)#

For two continuous random variables $X$ and $Y$:

Probability over a Region A: $$P((X, Y) \in A) = \iint_A f_{X,Y}(x, y) \,dx \,dy$$
Conditions:
1. $f_{X,Y}(x, y) \ge 0$ for all $(x, y)$
2. $\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f_{X,Y}(x, y) \,dx \,dy = 1$

Marginal Distributions#

Marginal PMF of X (Discrete): $$p_X(x) = P(X=x) = \sum_{y} P(X=x, Y=y) = \sum_{y} p_{X,Y}(x, y)$$
Marginal PMF of Y (Discrete): $$p_Y(y) = P(Y=y) = \sum_{x} P(X=x, Y=y) = \sum_{x} p_{X,Y}(x, y)$$
Marginal PDF of X (Continuous): $$f_X(x) = \int_{-\infty}^{\infty} f_{X,Y}(x, y) \,dy$$
Marginal PDF of Y (Continuous): $$f_Y(y) = \int_{-\infty}^{\infty} f_{X,Y}(x, y) \,dx$$

Conditional Distributions#

Conditional PMF of Y given X=x (Discrete): $$p_{Y|X}(y|x) = P(Y=y | X=x) = \frac{P(X=x, Y=y)}{P(X=x)} = \frac{p_{X,Y}(x, y)}{p_X(x)}$$ (provided $p_X(x) > 0$)
Conditional PDF of Y given X=x (Continuous): $$f_{Y|X}(y|x) = \frac{f_{X,Y}(x, y)}{f_X(x)}$$ (provided $f_X(x) > 0$)

Joint Cumulative Distribution Functions (CDFs)#

Joint CDF Definition: $$F_{X,Y}(x, y) = P(X \le x, Y \le y)$$
Discrete Case: $$F_{X,Y}(x, y) = \sum_{x_i \le x} \sum_{y_j \le y} p_{X,Y}(x_i, y_j)$$
Continuous Case: $$F_{X,Y}(x, y) = \int_{-\infty}^{x} \int_{-\infty}^{y} f_{X,Y}(u, v) \,dv \,du$$
Properties:
1. $0 \le F_{X,Y}(x, y) \le 1$
2. $F_{X,Y}(x, y)$ is non-decreasing in both $x$ and $y$.
3. $\lim_{x \to \infty, y \to \infty} F_{X,Y}(x, y) = 1$
4. $\lim_{x \to -\infty} F_{X,Y}(x, y) = 0$ and $\lim_{y \to -\infty} F_{X,Y}(x, y) = 0$

Chapter 11: Independence, Covariance, and Correlation#

Independence of Random Variables#

Two random variables $X$ and $Y$ are independent if for any sets $A$ and $B$: $$P(X \in A, Y \in B) = P(X \in A) P(Y \in B)$$

This is equivalent to:

Discrete: $$P(X=x, Y=y) = P(X=x) P(Y=y)$$ (Joint PMF = Product of Marginal PMFs)
Continuous: $$f_{X,Y}(x,y) = f_X(x) f_Y(y)$$ (Joint PDF = Product of Marginal PDFs)

Covariance#

The covariance between two random variables $X$ and $Y$:

Definition: $$\mathrm{Cov}(X, Y) = E[(X - E[X])(Y - E[Y])]$$
Computational Formula: $$\mathrm{Cov}(X, Y) = E[XY] - E[X]E[Y]$$
Properties:
1. $\mathrm{Cov}(X, X) = \mathrm{Var}(X)$
2. $\mathrm{Cov}(X, Y) = \mathrm{Cov}(Y, X)$
3. $\mathrm{Cov}(aX + b, cY + d) = ac \mathrm{Cov}(X, Y)$
4. $\mathrm{Cov}(X+Y, Z) = \mathrm{Cov}(X, Z) + \mathrm{Cov}(Y, Z)$
5. If $X$ and $Y$ are independent, then $\mathrm{Cov}(X, Y) = 0$.

Correlation Coefficient#

The Pearson correlation coefficient between two random variables $X$ and $Y$:

Definition: $$\rho(X, Y) = \frac{\mathrm{Cov}(X, Y)}{\sigma_X \sigma_Y} = \frac{\mathrm{Cov}(X, Y)}{\sqrt{\mathrm{Var}(X) \mathrm{Var}(Y)}}$$
Properties:
1. $-1 \le \rho(X, Y) \le 1$
2. $\rho(aX + b, cY + d) = \mathrm{sign}(ac) \rho(X, Y)$, (assuming $a \ne 0, c \ne 0$)

Variance of Sums of Random Variables#

For any two random variables $X$ and $Y$, and constants $a$ and $b$:

General Formula: $$\mathrm{Var}(aX + bY) = a^2 \mathrm{Var}(X) + b^2 \mathrm{Var}(Y) + 2ab \mathrm{Cov}(X, Y)$$
Sum of Variables ($a=1, b=1$): $$\mathrm{Var}(X + Y) = \mathrm{Var}(X) + \mathrm{Var}(Y) + 2 \mathrm{Cov}(X, Y)$$
Difference of Variables ($a=1, b=-1$): $$\mathrm{Var}(X - Y) = \mathrm{Var}(X) + \mathrm{Var}(Y) - 2 \mathrm{Cov}(X, Y)$$
If $X$ and $Y$ are independent ($\mathrm{Cov}(X, Y) = 0$): $$\mathrm{Var}(aX + bY) = a^2 \mathrm{Var}(X) + b^2 \mathrm{Var}(Y)$$\mathrm{Var}(X + Y) = \mathrm{Var}(X) + \mathrm{Var}(Y)$$\mathrm{Var}(X - Y) = \mathrm{Var}(X) + \mathrm{Var}(Y)$$
Extension to Multiple Variables ($X_1, X_2, ..., X_n$): $$\mathrm{Var}\left(\sum_{i=1}^n a_i X_i\right) = \sum_{i=1}^n a_i^2 \mathrm{Var}(X_i) + \sum_{i \ne j} a_i a_j \mathrm{Cov}(X_i, X_j)$$ or $$\mathrm{Var}\left(\sum_{i=1}^n a_i X_i\right) = \sum_{i=1}^n a_i^2 \mathrm{Var}(X_i) + 2 \sum_{i < j} a_i a_j \mathrm{Cov}(X_i, X_j)$$
If all $X_i$ are independent: $$\mathrm{Var}\left(\sum_{i=1}^n a_i X_i\right) = \sum_{i=1}^n a_i^2 \mathrm{Var}(X_i)$$

Chapter 12: Functions of Multiple Random Variables#

Sums of Independent Random Variables (Convolution)#

Let $X$ and $Y$ be two random variables, and $Z = X+Y$.

Discrete Case (PMF of Z): $$P(Z=z) = \sum_{k} P(X=k, Y=z-k)$$ If $X$ and $Y$ are independent: $$P(Z=z) = \sum_{k} P(X=k)P(Y=z-k)$$ This is the discrete convolution of the PMFs.
Continuous Case (PDF of Z): $$f_Z(z) = \int_{-\infty}^{\infty} f_{X,Y}(x, z-x)dx$$ If $X$ and $Y$ are independent: $$f_Z(z) = \int_{-\infty}^{\infty} f_X(x)f_Y(z-x)dx = (f_X * f_Y)(z)$$ This is the convolution of the PDFs.

General Transformations (Jacobian Method for PDFs)#

If $Y_1 = g_1(X_1, X_2)$ and $Y_2 = g_2(X_1, X_2)$ are transformations of random variables $X_1, X_2$, and these transformations are invertible such that $X_1 = h_1(Y_1, Y_2)$ and $X_2 = h_2(Y_1, Y_2)$.

Joint PDF of $Y_1, Y_2$: $$f_{Y_1,Y_2}(y_1, y_2) = f_{X_1,X_2}(h_1(y_1,y_2), h_2(y_1,y_2)) |J|$$ Where $|J|$ is the absolute value of the determinant of the Jacobian matrix.
Jacobian Determinant (J): $$ J = \det \begin{pmatrix} \frac{\partial x_1}{\partial y_1} & \frac{\partial x_1}{\partial y_2} \\ \frac{\partial x_2}{\partial y_1} & \frac{\partial x_2}{\partial y_2} \end{pmatrix} $$

Order Statistics#

Let $X_1, X_2, \dots, X_n$ be $n$ independent and identically distributed (i.i.d.) random variables with CDF $F_X(x)$ and PDF $f_X(x)$. Let $X_{(1)}, X_{(2)}, \dots, X_{(n)}$ be the order statistics (sorted values).

CDF of the Maximum ($Y_n = X_{(n)}$): $$F_{Y_n}(y) = P(X_{(n)} \le y) = [F_X(y)]^n$$
PDF of the Maximum ($Y_n = X_{(n)}$): $$f_{Y_n}(y) = n[F_X(y)]^{n-1}f_X(y)$$
CDF of the Minimum ($Y_1 = X_{(1)}$): $$F_{Y_1}(y) = P(X_{(1)} \le y) = 1 - [1-F_X(y)]^n$$
PDF of the Minimum ($Y_1 = X_{(1)}$): $$f_{Y_1}(y) = n[1-F_X(y)]^{n-1}f_X(y)$$
PDF of the $k$-th Order Statistic ($Y_k = X_{(k)}$): $$f_{Y_k}(y) = \frac{n!}{(k-1)!(n-k)!} [F_X(y)]^{k-1} [1-F_X(y)]^{n-k} f_X(y)$$

Chapter 13: The Law of Large Numbers (LLN)#

Chebyshev’s Inequality#

For a random variable $X$ with mean $\mu$ and finite variance $\sigma^2$:

Form 1: $$P(|X - \mu| \ge k\sigma) \le \frac{1}{k^2}$$ (where $k$ is the number of standard deviations)
Form 2: $$P(|X - \mu| \ge \epsilon) \le \frac{\sigma^2}{\epsilon^2}$$ (where $\epsilon > 0$ is any positive number)

Weak Law of Large Numbers (WLLN)#

For a sequence of i.i.d. random variables $X_1, X_2, \dots, X_n$ with common mean $E[X_i] = \mu$ and common finite variance $Var(X_i) = \sigma^2$. Let $\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i$ be the sample mean.

Statement: For any $\epsilon > 0$, $$\lim_{n \to \infty} P(|\bar{X}_n - \mu| \ge \epsilon) = 0$$ or equivalently, $$\lim_{n \to \infty} P(|\bar{X}_n - \mu| < \epsilon) = 1$$
Formulas used in WLLN proof via Chebyshev’s Inequality:
- Expected Value of Sample Mean: $$E[\bar{X}_n] = \mu$$
- Variance of Sample Mean (for i.i.d. variables): $$Var(\bar{X}_n) = \frac{\sigma^2}{n}$$
- Application of Chebyshev’s Inequality to $\bar{X}_n$: $$P(|\bar{X}_n - \mu| \ge \epsilon) \le \frac{Var(\bar{X}_n)}{\epsilon^2} = \frac{\sigma^2}{n\epsilon^2}$$

Strong Law of Large Numbers (SLLN)#

For a sequence of i.i.d. random variables $X_1, X_2, \dots, X_n$ with common mean $E[X_i] = \mu$.

Statement: $$P\left(\lim_{n \to \infty} \bar{X}_n = \mu\right) = 1$$ (The sample mean converges almost surely to the population mean).

Chapter 14: The Central Limit Theorem (CLT)#

Statement of CLT (Lindeberg-Lévy CLT): Let $X_1, X_2, \dots, X_n$ be i.i.d. random variables with mean $\mu$ and variance $\sigma^2$. Let $\bar{X}_n$ be the sample mean. $$Z_n = \frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \xrightarrow{d} N(0, 1)$$ (where $\xrightarrow{d}$ denotes convergence in distribution)
Convergence in Distribution (for $Z_n$): $$\lim_{n \to \infty} P(Z_n \le z) = \Phi(z)$$ where $\Phi(z)$ is the CDF of the standard Normal distribution $N(0, 1)$.
Approximation for Sample Mean $\bar{X}_n$: $$P(\bar{X}_n \le x) \approx \Phi\left(\frac{x - \mu}{\sigma/\sqrt{n}}\right)$$
CLT for Sums ($S_n = \sum_{i=1}^{n} X_i$): $E[S_n] = n\mu$, $Var(S_n) = n\sigma^2$. $$\frac{S_n - n\mu}{\sqrt{n}\sigma} \xrightarrow{d} N(0, 1)$$
Normal Approximation to Binomial Distribution: For $X \sim \text{Binomial}(n, p)$:
- Mean: $E[X] = np$
- Variance: $Var(X) = np(1-p)$
- Approximation: $X \approx N(np, np(1-p))$ (if $np \ge 5$ and $n(1-p) \ge 5$ is a common rule of thumb).
Continuity Correction (for approximating discrete with continuous):
- To approximate $P(X \le k)$, use $P(Y \le k + 0.5)$
- To approximate $P(X \ge k)$, use $P(Y \ge k - 0.5)$
- To approximate $P(X = k)$, use $P(k - 0.5 \le Y \le k + 0.5)$
- To approximate $P(a \le X \le b)$, use $P(a - 0.5 \le Y \le b + 0.5)$

Chapter 15: Introduction to Bayesian Inference#

Bayes’ Theorem for Distributions: $$p(\theta | D) = \frac{p(D | \theta) p(\theta)}{p(D)}$$ Where:
- $p(\theta | D)$ is the posterior probability of parameter $\theta$ given data $D$.
- $p(D | \theta)$ is the likelihood of data $D$ given parameter $\theta$.
- $p(\theta)$ is the prior probability of parameter $\theta$.
- $p(D)$ is the evidence (or marginal likelihood of data).
Proportionality Form: $$\text{Posterior} \propto \text{Likelihood} \times \text{Prior}$$
Evidence Calculation:
- For continuous $\theta$: $p(D) = \int p(D|\theta) p(\theta) d\theta$
- For discrete $\theta$: $p(D) = \sum_{\theta} p(D|\theta) p(\theta)$
Beta-Binomial Conjugate Prior Update: If prior is $\text{Beta}(\alpha_{prior}, \beta_{prior})$ and data is $k$ successes in $n$ trials (Binomial likelihood):
- Posterior is $\text{Beta}(\alpha_{posterior}, \beta_{posterior}) = \text{Beta}(\alpha_{prior} + k, \beta_{prior} + n - k)$
Point Estimates from Posterior:
- Maximum a Posteriori (MAP) Estimate: $$\hat{\theta}_{MAP} = \arg \max_{\theta} p(\theta | D)$$ For a Beta$(\alpha, \beta)$ posterior (if $\alpha > 1, \beta > 1$): $$\hat{\theta}_{MAP} = \frac{\alpha - 1}{\alpha + \beta - 2}$$
- Posterior Mean: $$\hat{\theta}_{Mean} = E[\theta | D] = \int \theta p(\theta | D) d\theta$$ For a Beta$(\alpha, \beta)$ posterior: $$\hat{\theta}_{Mean} = \frac{\alpha}{\alpha + \beta}$$
Credible Interval: An interval $[L, U]$ such that: $$P(L \le \theta \le U | D) = \int_L^U p(\theta | D) d\theta = 1 - \gamma$$ (where $1-\gamma$ is the credibility level, e.g., 95%)

Chapter 16: Introduction to Markov Chains#

Transition Probability (from state $i$ to state $j$): $$P_{ij} = P(X_{t+1} = s_j | X_t = s_i)$$
n-Step Transition Probability: The $(i, j)$-th entry of the matrix $P^n$ (the transition matrix $P$ raised to the power of $n$): $$P^{(n)}_{ij} = P(X_{t+n} = s_j | X_t = s_i) = (P^n)_{ij}$$
Stationary Distribution ($\pi$): A row vector $\pi = [\pi_1, \pi_2, ..., \pi_k]$ such that: $$\pi P = \pi$$ and $$\sum_{j=1}^{k} \pi_j = 1$$

Chapter 17: Monte Carlo Methods#

Estimating Probability $P(A)$: $$P(A) \approx \frac{N_A}{N}$$ (where $N_A$ is the number of times event A occurred in $N$ simulations)
Estimating Expected Value $E[g(X)]$: $$E[g(X)] \approx \frac{1}{N} \sum_{i=1}^{N} g(X_i)$$ (where $X_i$ are samples from the distribution of $X$)
Monte Carlo Integration (Hit-or-Miss for Area): $$\text{Area}(A) \approx \text{Area}(B) \times \frac{N_{hit}}{N}$$
Monte Carlo Integration (Using Expected Values for $I = \int_a^b g(x) dx$): If $X \sim \text{Uniform}(a, b)$: $$I \approx (b-a) \times \frac{1}{N} \sum_{i=1}^{N} g(X_i)$$
Inverse Transform Method for Generating Random Variables: If $U \sim \text{Uniform}(0, 1)$, then $X = F^{-1}(U)$ has CDF $F(x)$.
- For Exponential($\lambda$): $F^{-1}(u) = -\frac{1}{\lambda} \ln(1 - u)$ or $F^{-1}(u) = -\frac{1}{\lambda} \ln(u)$.
Acceptance-Rejection Method: To sample from target PDF $f(x)$ using proposal PDF $g(x)$ where $f(x) \le c \cdot g(x)$:
1. Sample $y$ from $g(x)$.
2. Sample $u$ from $\text{Uniform}(0, 1)$.
3. Accept $y$ if $u \le \frac{f(y)}{c \cdot g(y)}$.
Buffon’s Needle Problem ($L \le D$): Probability of needle crossing a line: $$P(\text{cross}) = \frac{2L}{\pi D}$$ For $L=1, D=2$: $$\pi \approx \frac{1}{P(\text{cross})}$$

Chapter 18: (Optional) Further Explorations#

Entropy $H(X)$ (for discrete random variable $X$ with PMF $p(x)$): $$H(X) = - \sum_{x} p(x) \log_b p(x)$$ (Base $b$ is often 2 for bits, or $e$ for nats)
Kullback-Leibler (KL) Divergence (for discrete distributions $P$ and $Q$): $$D_{KL}(P || Q) = \sum_{x} P(x) \log_b \frac{P(x)}{Q(x)}$$
Geometric Brownian Motion (GBM) $S(t)$:
- Stochastic Differential Equation: $dS(t) = \mu S(t) dt + \sigma S(t) dW(t)$
- Solution: $S(t) = S(0) \exp\left( \left(\mu - \frac{\sigma^2}{2}\right)t + \sigma W(t) \right)$
Probability Generating Function (PGF) $G_X(z)$ (for non-negative integer-valued RV $X$): $$G_X(z) = E[z^X] = \sum_{k=0}^{\infty} P(X=k) z^k$$
- $E[X] = G'_X(1)$
- $Var(X) = G''_X(1) + G'_X(1) - [G'_X(1)]^2$
Moment Generating Function (MGF) $M_X(t)$: $$M_X(t) = E[e^{tX}]$$
- $E[X^n] = M_X^{(n)}(0)$ (n-th derivative evaluated at $t=0$)
- For independent $X, Y$: $M_{X+Y}(t) = M_X(t) M_Y(t)$

Appendix E: Summary of Formulas

Contents

Appendix E: Summary of Formulas#

Chapter 2: The Language of Probability: Sets, Sample Spaces, and Events#

Axioms of Probability#

Basic Probability Rules#

Empirical Probability#

Chapter 3: Counting Techniques: Permutations and Combinations#

The Multiplication Principle#

Permutations (Order Matters)#

Combinations (Order Doesn’t Matter)#

Probability with Equally Likely Outcomes#

Chapter 4: Conditional Probability#

Definition of Conditional Probability#

The Multiplication Rule for Conditional Probability#

The Law of Total Probability#

Chapter 5: Bayes’ Theorem and Independence#

Bayes’ Theorem#

Independence of Events#

Conditional Independence#

Chapter 6: Discrete Random Variables#

Probability Mass Function (PMF)#

Cumulative Distribution Function (CDF)#

Expected Value (Mean)#

Variance#

Standard Deviation#

Functions of a Random Variable#

Chapter 7: Common Discrete Distributions#

Bernoulli Distribution#

Binomial Distribution#

Geometric Distribution#

Negative Binomial Distribution#

Poisson Distribution#

Hypergeometric Distribution#

Chapter 8: Continuous Random Variables#

Probability Density Function (PDF)#

Cumulative Distribution Function (CDF)#

Expected Value (Mean)#

Variance#

Standard Deviation#

Percentiles and Quantiles#

Functions of a Continuous Random Variable#

Chapter 9: Common Continuous Distributions#

1. Uniform Distribution#

2. Exponential Distribution#

3. Normal (Gaussian) Distribution#

4. Gamma Distribution#

5. Beta Distribution#

Chapter 10: Joint Distributions#

Joint Probability Mass Functions (PMFs)#

Joint Probability Density Functions (PDFs)#

Marginal Distributions#

Conditional Distributions#

Joint Cumulative Distribution Functions (CDFs)#

Chapter 11: Independence, Covariance, and Correlation#

Independence of Random Variables#

Covariance#

Correlation Coefficient#

Variance of Sums of Random Variables#

Chapter 12: Functions of Multiple Random Variables#

Sums of Independent Random Variables (Convolution)#

General Transformations (Jacobian Method for PDFs)#

Order Statistics#

Chapter 13: The Law of Large Numbers (LLN)#

Chebyshev’s Inequality#

Weak Law of Large Numbers (WLLN)#

Strong Law of Large Numbers (SLLN)#

Chapter 14: The Central Limit Theorem (CLT)#

Chapter 15: Introduction to Bayesian Inference#

Chapter 16: Introduction to Markov Chains#

Chapter 17: Monte Carlo Methods#

Chapter 18: (Optional) Further Explorations#