Appendix E: Summary of Formulas

This appendix provides a summary of the key formulas introduced in Chapters 1–8.

Chapter 2: The Language of Probability: Sets, Sample Spaces, and Events¶

Axioms of Probability¶

Let $S$ be a sample space, and $P(A)$ denote the probability of an event $A$ .

Non-negativity: For any event $A$ , the probability of $A$ is greater than or equal to zero. $P(A)\ge 0$
Normalization: The probability of the entire sample space $S$ is equal to 1. $P(S)=1$
Additivity for Disjoint Events: If $A_1,A_2,A_3,\dots$ is a sequence of mutually exclusive (disjoint) events (i.e., $A_i\cap A_j=\emptyset$ for all $i\ne j$ ), then the probability of their union is the sum of their individual probabilities.
$P(A_1\cup A_2\cup A_3\cup \cdots)=P(A_1)+P(A_2)+P(A_3)+\cdots$
(1)
- For a finite number of disjoint events, say $A$ and $B$ : If $A\cap B=\emptyset$ , then $P(A\cup B)=P(A)+P(B)$

Probability of Impossible Event: The probability of an impossible event (the empty set, $\emptyset$ ) is 0. $P(\emptyset)=0$

Basic Probability Rules¶

Probability Range: For any event $A$ : $0\le P(A)\le 1$
Complement Rule: The probability that event $A$ does not occur is 1 minus the probability that it does occur. $P(A^c)=1-P(A)$
Addition Rule (General): For any two events $A$ and $B$ (not necessarily disjoint), the probability that $A$ or $B$ (or both) occurs is:
$P(A\cup B)=P(A)+P(B)-P(A\cap B)$
(2)

Empirical Probability¶

The empirical probability of an event $A$ is estimated from simulations:

P_{\text{empirical}}(A)=\frac{\text{Number of times event $A$ occurred}}{\text{Total number of trials}}.

(3)

Chapter 3: Counting Techniques: Permutations and Combinations¶

The Multiplication Principle¶

If a procedure can be broken down into a sequence of $k$ steps, with $n_1$ ways for the first step, $n_2$ for the second, $\dots$ , $n_k$ for the $k$ -th step, then the total number of ways to perform the entire procedure is:

\text{Total ways}=n_1\times n_2\times \cdots \times n_k.

(4)

Permutations (Order Matters)¶

Permutations without Repetition: The number of permutations of $n$ distinct objects taken $k$ at a time:
$P(n,k)=\frac{n!}{(n-k)!}$
(5)
- Special Case: Arranging all $n$ distinct objects: $P(n,n)=n!$
Permutations with Repetition (Multinomial Coefficients): The number of distinct permutations of $n$ objects where there are $n_1$ identical objects of type 1, $n_2$ of type 2, $\dots$ , $n_k$ of type $k$ (such that $n_1+n_2+\cdots+n_k=n$ ):
$\frac{n!}{n_1!,n_2!\cdots n_k!}$
(6)

Combinations (Order Doesn’t Matter)¶

Combinations without Repetition: The number of combinations of $n$ distinct objects taken $k$ at a time (also “ $n$ choose $k$ ”):
$C(n,k)=\binom{n}{k}=\frac{n!}{k!(n-k)!}$
(7)
- Relationship to permutations:
  $C(n,k)=\frac{P(n,k)}{k!}$
  (8)
Combinations with Repetition: The number of combinations with repetition of $n$ types of objects taken $k$ at a time:
$\binom{n+k-1}{k}=\frac{(n+k-1)!}{k!(n-1)!}$
(9)

Probability with Equally Likely Outcomes¶

The probability of an event $E$ when all outcomes in the sample space $S$ are equally likely:

P(E)=\frac{\text{Number of outcomes favorable to }E}{\text{Total number of possible outcomes in }S} =\frac{|E|}{|S|}.

(10)

Chapter 4: Conditional Probability¶

Definition of Conditional Probability¶

For any two events $A$ and $B$ from a sample space $S$ , where $P(B)>0$ , the conditional probability of $A$ given $B$ is defined as:

P(A\mid B)=\frac{P(A\cap B)}{P(B)}.

(11)

The Multiplication Rule for Conditional Probability¶

Rearranging the definition of conditional probability gives:

P(A\cap B)=P(A\mid B)P(B).

(12)

Similarly, if $P(A)>0$ :

P(A\cap B)=P(B\mid A)P(A).

(13)

For three events $A,B,C$ :

P(A\cap B\cap C)=P(C\mid A\cap B),P(B\mid A),P(A).

(14)

The Law of Total Probability¶

Let $B_1,B_2,\dots,B_n$ be a partition of the sample space $S$ . Then, for any event $A$ in $S$ :

P(A)=\sum_{i=1}^{n} P(A\mid B_i),P(B_i).

(15)

Expanded form:

P(A)=P(A\mid B_1)P(B_1)+P(A\mid B_2)P(B_2)+\cdots+P(A\mid B_n)P(B_n).

(16)

Chapter 5: Bayes’ Theorem and Independence¶

Bayes’ Theorem¶

Provides a way to “reverse” conditional probabilities. If $P(B)>0$ :

P(A\mid B)=\frac{P(B\mid A)P(A)}{P(B)}.

(17)

Where $P(B)$ can often be calculated using the Law of Total Probability (e.g., with a partition ${A,A^c}$ ):

P(B)=P(B\mid A)P(A)+P(B\mid A^c)P(A^c).

(18)

Independence of Events¶

Formal Definition: Events $A$ and $B$ are independent if and only if:
$P(A\cap B)=P(A)P(B)$
(19)
Alternative Definition (using conditional probability):
- If $P(B)>0$ , $A$ and $B$ are independent if and only if: $P(A\mid B)=P(A)$
- Similarly, if $P(A)>0$ , independence means: $P(B\mid A)=P(B)$

Conditional Independence¶

Notation:

A \perp B \mid C \quad \text{means “$A$ and $B$ are conditionally independent given $C$.”}

(20)

Definition (with $P(C)>0$ ):

A \perp B \mid C \iff P(A\cap B\mid C)=P(A\mid C) P(B\mid C).

(21)

Equivalent “no extra information” form: if $P(B\cap C)>0$ , then

A \perp B \mid C \iff P(A\mid B\cap C)=P(A\mid C).

(22)

Likewise, if $P(A\cap C)>0$ , then

P(B\mid A\cap C)=P(B\mid C).

(23)

Chapter 6: Discrete Random Variables¶

Probability Mass Function (PMF)¶

For a discrete random variable $X$ , the PMF $p_X(x)$ is:

p_X(x)=P(X=x)

(24)

Properties of a PMF:

$p_X(x)\ge 0$ for all possible values $x$ .
$\sum_x p_X(x)=1$
(25)
(sum over all possible values $x$ ).

Cumulative Distribution Function (CDF)¶

For a random variable $X$ , the CDF $F_X(x)$ is:

F_X(x)=P(X\le x)

(26)

For a discrete random variable $X$ :

F_X(x)=\sum_{k\le x} p_X(k)

(27)

Properties of a CDF:

$0\le F_X(x)\le 1$ for all $x$ .
If $a<b$ , then $F_X(a)\le F_X(b)$ (non-decreasing).
$\lim_{x\to -\infty} F_X(x)=0$
(28)
$\lim_{x\to +\infty} F_X(x)=1$
(29)
$P(X>x)=1-F_X(x)$
$P(a<X\le b)=F_X(b)-F_X(a)$ for $a<b$ .
$P(X=x)=F_X(x)-\lim_{y\to x^-}F_X(y)$
(30)
(for a discrete RV, this is the jump at $x$ ).

Expected Value (Mean)¶

For a discrete random variable $X$ :

E[X]=\mu_X=\sum_x x\cdot p_X(x)

(31)

Variance¶

For a random variable $X$ with mean $\mu_X$ :

\operatorname{Var}(X)=\sigma_X^2=E[(X-\mu_X)^2]

(32)

For a discrete random variable $X$ :

\operatorname{Var}(X)=\sum_x (x-\mu_X)^2\cdot p_X(x)

(33)

Computational formula for variance:

\operatorname{Var}(X)=E[X^2]-(E[X])^2

(34)

Where $E[X^2]$ for a discrete random variable is:

E[X^2]=\sum_x x^2\cdot p_X(x)

(35)

Standard Deviation¶

The positive square root of the variance:

SD(X)=\sigma_X=\sqrt{\operatorname{Var}(X)}

(36)

Functions of a Random Variable¶

If $Y=g(X)$ :

PMF of $Y$ (for discrete $X$ ):
$p_Y(y)=P(Y=y)=P(g(X)=y)=\sum_{x:,g(x)=y} p_X(x)$
(37)
Expected Value of $Y=g(X)$ (LOTUS - Law of the Unconscious Statistician): For a discrete random variable $X$ :
$E[Y]=E[g(X)]=\sum_x g(x)\cdot p_X(x)$
(38)

Chapter 7: Common Discrete Distributions¶

Bernoulli Distribution¶

Models a single trial with two outcomes (success=1, failure=0). Parameter: $p$ (probability of success).

PMF:
$P(X=k)=p^k(1-p)^{1-k}\quad \text{for }k\in{0,1}$
(39)
- Alternatively:
  $P(X=k)= \begin{cases} p, & k=1 \ 1-p, & k=0 \ 0, & \text{otherwise} \end{cases}$
  (40)
Mean: $E[X]=p$
Variance: $\operatorname{Var}(X)=p(1-p)$

Binomial Distribution¶

Models the number of successes in $n$ independent Bernoulli trials. Parameters: $n$ (number of trials), $p$ (probability of success on each trial).

PMF:
$P(X=k)=\binom{n}{k}p^k(1-p)^{n-k}\quad \text{for }k=0,1,\dots,n$
(41)
Mean: $E[X]=np$
Variance: $\operatorname{Var}(X)=np(1-p)$

Geometric Distribution¶

Models the number of trials ( $k$ ) needed to get the first success. Parameter: $p$ (probability of success on each trial).

PMF (for $X=$ trial number of first success):
$P(X=k)=(1-p)^{k-1}p\quad \text{for }k=1,2,3,\dots$
(42)
Mean (trial number of first success):
$E[X]=\frac{1}{p}$
(43)
Variance (trial number of first success):
$\operatorname{Var}(X)=\frac{1-p}{p^2}$
(44)

Negative Binomial Distribution¶

Models the number of trials ( $k$ ) needed to achieve $r$ successes. Parameters: $r$ (target number of successes), $p$ (probability of success on each trial).

PMF (for $X=$ trial number of $r$ -th success):
$P(X=k)=\binom{k-1}{r-1}p^r(1-p)^{k-r}\quad \text{for }k=r,r+1,r+2,\dots$
(45)
Mean (trial number of $r$ -th success):
$E[X]=\frac{r}{p}$
(46)
Variance (trial number of $r$ -th success):
$\operatorname{Var}(X)=\frac{r(1-p)}{p^2}$
(47)

Poisson Distribution¶

Models the number of events occurring in a fixed interval of time or space. Parameter: $\lambda$ (average number of events in the interval).

PMF:
$P(X=k)=\frac{e^{-\lambda}\lambda^k}{k!}\quad \text{for }k=0,1,2,\dots$
(48)
Mean: $E[X]=\lambda$
Variance: $\operatorname{Var}(X)=\lambda$

Hypergeometric Distribution¶

Models the number of successes in a sample of size $n$ drawn without replacement from a finite population of size $N$ containing $K$ successes. Parameters: $N$ (population size), $K$ (total successes in population), $n$ (sample size).

PMF:
$P(X=k)=\frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}$
(49)
for $k$ such that $\max(0,n-(N-K))\le k\le \min(n,K)$ .
Mean:
$E[X]=n\frac{K}{N}$
(50)
Variance:
$\operatorname{Var}(X)=n\frac{K}{N}\left(1-\frac{K}{N}\right)\left(\frac{N-n}{N-1}\right)$
(51)
- Finite Population Correction Factor:
  $\frac{N-n}{N-1}$
  (52)

Chapter 8: Continuous Random Variables¶

Probability Density Function (PDF)¶

For a continuous random variable $X$ , the PDF $f_X(x)$ describes the relative likelihood of $X$ . Properties of a PDF:

$f_X(x)\ge 0$ for all $x$ .
$\int_{-\infty}^{\infty} f_X(x)\,dx = 1$ (total area under curve is 1).
$P(a\le X\le b)=\int_a^b f_X(x)\,dx$ .
For any specific value $c$ : $P(X=c)=\int_c^c f_X(x)\,dx=0$ .

Cumulative Distribution Function (CDF)¶

For a continuous random variable $X$ , the CDF $F_X(x)$ is:

F_X(x)=P(X\le x)=\int_{-\infty}^{x} f_X(t),dt

(53)

Properties of a CDF:

$F_X(x)$ is non-decreasing.
$\lim_{x\to -\infty} F_X(x)=0$
(54)
$\lim_{x\to \infty} F_X(x)=1$
(55)
$P(a<X\le b)=F_X(b)-F_X(a)$ .
$f_X(x)=\frac{d}{dx}F_X(x)$
(56)
(where the derivative exists).

Expected Value (Mean)¶

For a continuous random variable $X$ :

E[X]=\mu=\int_{-\infty}^{\infty} x f_X(x),dx

(57)

Variance¶

For a continuous random variable $X$ with mean $\mu$ :

\operatorname{Var}(X)=\sigma^2=E[(X-\mu)^2]=\int_{-\infty}^{\infty} (x-\mu)^2 f_X(x),dx

(58)

Computational formula:

\operatorname{Var}(X)=E[X^2]-(E[X])^2

(59)

Where

E[X^2]=\int_{-\infty}^{\infty} x^2 f_X(x),dx.

(60)

Standard Deviation¶

The positive square root of the variance:

\sigma=\sqrt{\operatorname{Var}(X)}

(61)

Percentiles and Quantiles¶

The $p$ -th percentile $x_p$ is the value such that $F_X(x_p)=P(X\le x_p)=p$ . The quantile function $Q(p)$ is the inverse of the CDF:

Q(p)=F_X^{-1}(p)=x_p.

(62)

Functions of a Continuous Random Variable¶

If $Y=g(X)$ :

CDF of $Y$ :
$F_Y(y)=P(Y\le y)=P(g(X)\le y)$
(63)
PDF of $Y$ (Change of Variables Formula): If $g(x)$ is monotonic with inverse $x=g^{-1}(y)$ , then:
$f_Y(y)=f_X(g^{-1}(y))\left|\frac{dx}{dy}\right|$
(64)
Expected Value of $Y=g(X)$ (LOTUS):
$E[Y]=E[g(X)]=\int_{-\infty}^{\infty} g(x)f_X(x),dx$
(65)

Chapter 9: Common Continuous Distributions¶

1. Uniform Distribution¶

X∼U(a,b)

PDF (Probability Density Function):
$f(x; a, b) = \begin{cases} \frac{1}{b-a} & \text{for } a \le x \le b \\ 0 & \text{otherwise} \end{cases}$
(66)
CDF (Cumulative Distribution Function):
$F(x; a, b) = P(X \le x) = \begin{cases} 0 & \text{for } x < a \\ \frac{x-a}{b-a} & \text{for } a \le x \le b \\ 1 & \text{for } x > b \end{cases}$
(67)
Expected Value: E[X]=2a+b
Variance: Var(X)=12(b−a)2

2. Exponential Distribution¶

T∼Exp(λ)

PDF (Probability Density Function):
$f(t; \lambda) = \begin{cases} \lambda e^{-\lambda t} & \text{for } t \ge 0 \\ 0 & \text{for } t < 0 \end{cases}$
(68)
CDF (Cumulative Distribution Function):
$F(t; \lambda) = P(T \le t) = \begin{cases} 1 - e^{-\lambda t} & \text{for } t \ge 0 \\ 0 & \text{for } t < 0 \end{cases}$
(69)
Survival Function: P(T>t)=1−F(t)=e−λt
Expected Value: E[T]=λ1
Variance: Var(T)=λ21
Memoryless Property: P(T>s+t∣T>s)=P(T>t) for any s,t≥0.

3. Normal (Gaussian) Distribution¶

X∼N(μ,σ2)

PDF (Probability Density Function):
$f(x; \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{ - \frac{(x-\mu)^2}{2\sigma^2} }$
(70)
for −∞<x<∞.
Expected Value: E[X]=μ
Variance: Var(X)=σ2
Standardization (Z-score): Z=σX−μ where Z∼N(0,1).

4. Gamma Distribution¶

X∼Gamma(k,λ) (using shape k and rate λ) or X∼Gamma(k,θ) (using shape k and scale θ=1/λ)
The Gamma function is Γ(k)=∫0∞xk−1e−xdx. For positive integers k, Γ(k)=(k−1)!.

PDF (Probability Density Function): Using shape k and rate λ:
$f(x; k, \lambda) = \frac{\lambda^k x^{k-1} e^{-\lambda x}}{\Gamma(k)} \quad \text{for } x \ge 0$
(71)
Using shape k and scale θ=1/λ:
$f(x; k, \theta) = \frac{1}{\Gamma(k)\theta^k} x^{k-1} e^{-x/\theta} \quad \text{for } x \ge 0$
(72)
Expected Value: E[X]=λk=kθ
Variance: Var(X)=λ2k=kθ2

5. Beta Distribution¶

X∼Beta(α,β)
The Beta function is B(α,β)=∫01tα−1(1−t)β−1dt=Γ(α+β)Γ(α)Γ(β).

PDF (Probability Density Function):
$f(x; \alpha, \beta) = \frac{1}{B(\alpha, \beta)} x^{\alpha-1} (1-x)^{\beta-1} = \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)} x^{\alpha-1} (1-x)^{\beta-1}$
(73)
for 0≤x≤1.
Expected Value: E[X]=α+βα
Variance: Var(X)=(α+β)2(α+β+1)αβ

Chapter 10: Joint Distributions¶

Joint Probability Mass Functions (PMFs)¶

For two discrete random variables $X$ and $Y$ :

Joint PMF Definition:
$p_{X,Y}(x, y) = P(X=x, Y=y)$
(74)
Conditions:
1. $p_{X,Y}(x, y) \ge 0$ for all $(x, y)$
2. $\sum_{x} \sum_{y} p_{X,Y}(x, y) = 1$

Joint Probability Density Functions (PDFs)¶

For two continuous random variables $X$ and $Y$ :

Probability over a Region A:
$P((X, Y) \in A) = \iint_A f_{X,Y}(x, y) \,dx \,dy$
(75)
Conditions:
1. $f_{X,Y}(x, y) \ge 0$ for all $(x, y)$
2. $\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f_{X,Y}(x, y) \,dx \,dy = 1$

Marginal Distributions¶

Marginal PMF of X (Discrete):
$p_X(x) = P(X=x) = \sum_{y} P(X=x, Y=y) = \sum_{y} p_{X,Y}(x, y)$
(76)
Marginal PMF of Y (Discrete):
$p_Y(y) = P(Y=y) = \sum_{x} P(X=x, Y=y) = \sum_{x} p_{X,Y}(x, y)$
(77)
Marginal PDF of X (Continuous):
$f_X(x) = \int_{-\infty}^{\infty} f_{X,Y}(x, y) \,dy$
(78)
Marginal PDF of Y (Continuous):
$f_Y(y) = \int_{-\infty}^{\infty} f_{X,Y}(x, y) \,dx$
(79)

Conditional Distributions¶

Conditional PMF of Y given X=x (Discrete):
$p_{Y|X}(y|x) = P(Y=y | X=x) = \frac{P(X=x, Y=y)}{P(X=x)} = \frac{p_{X,Y}(x, y)}{p_X(x)}$
(80)
(provided $p_X(x) > 0$ )
Conditional PDF of Y given X=x (Continuous):
$f_{Y|X}(y|x) = \frac{f_{X,Y}(x, y)}{f_X(x)}$
(81)
(provided $f_X(x) > 0$ )

Joint Cumulative Distribution Functions (CDFs)¶

Joint CDF Definition:
$F_{X,Y}(x, y) = P(X \le x, Y \le y)$
(82)
Discrete Case:
$F_{X,Y}(x, y) = \sum_{x_i \le x} \sum_{y_j \le y} p_{X,Y}(x_i, y_j)$
(83)
Continuous Case:
$F_{X,Y}(x, y) = \int_{-\infty}^{x} \int_{-\infty}^{y} f_{X,Y}(u, v) \,dv \,du$
(84)
Properties:
1. $0 \le F_{X,Y}(x, y) \le 1$
2. $F_{X,Y}(x, y)$ is non-decreasing in both $x$ and $y$ .
3. $\lim_{x \to \infty, y \to \infty} F_{X,Y}(x, y) = 1$
4. $\lim_{x \to -\infty} F_{X,Y}(x, y) = 0$ and $\lim_{y \to -\infty} F_{X,Y}(x, y) = 0$

Chapter 11: Independence, Covariance, and Correlation¶

Independence of Random Variables¶

Two random variables $X$ and $Y$ are independent if for any sets $A$ and $B$ :

P(X \in A, Y \in B) = P(X \in A) P(Y \in B)

(85)

This is equivalent to:

Discrete:
$P(X=x, Y=y) = P(X=x) P(Y=y)$
(86)
(Joint PMF = Product of Marginal PMFs)
Continuous:
$f_{X,Y}(x,y) = f_X(x) f_Y(y)$
(87)
(Joint PDF = Product of Marginal PDFs)

Covariance¶

The covariance between two random variables $X$ and $Y$ :

Definition:
$\mathrm{Cov}(X, Y) = E[(X - E[X])(Y - E[Y])]$
(88)
Computational Formula:
$\mathrm{Cov}(X, Y) = E[XY] - E[X]E[Y]$
(89)
Properties:
1. $\mathrm{Cov}(X, X) = \mathrm{Var}(X)$
2. $\mathrm{Cov}(X, Y) = \mathrm{Cov}(Y, X)$
3. $\mathrm{Cov}(aX + b, cY + d) = ac \mathrm{Cov}(X, Y)$
4. $\mathrm{Cov}(X+Y, Z) = \mathrm{Cov}(X, Z) + \mathrm{Cov}(Y, Z)$
5. If $X$ and $Y$ are independent, then $\mathrm{Cov}(X, Y) = 0$ .

Correlation Coefficient¶

The Pearson correlation coefficient between two random variables $X$ and $Y$ :

Definition:
$\rho(X, Y) = \frac{\mathrm{Cov}(X, Y)}{\sigma_X \sigma_Y} = \frac{\mathrm{Cov}(X, Y)}{\sqrt{\mathrm{Var}(X) \mathrm{Var}(Y)}}$
(90)
Properties:
1. $-1 \le \rho(X, Y) \le 1$
2. $\rho(aX + b, cY + d) = \mathrm{sign}(ac) \rho(X, Y)$ , (assuming $a \ne 0, c \ne 0$ )

Variance of Sums of Random Variables¶

For any two random variables $X$ and $Y$ , and constants $a$ and $b$ :

General Formula:
$\mathrm{Var}(aX + bY) = a^2 \mathrm{Var}(X) + b^2 \mathrm{Var}(Y) + 2ab \mathrm{Cov}(X, Y)$
(91)
Sum of Variables ( $a=1, b=1$ ):
$\mathrm{Var}(X + Y) = \mathrm{Var}(X) + \mathrm{Var}(Y) + 2 \mathrm{Cov}(X, Y)$
(92)
Difference of Variables ( $a=1, b=-1$ ):
$\mathrm{Var}(X - Y) = \mathrm{Var}(X) + \mathrm{Var}(Y) - 2 \mathrm{Cov}(X, Y)$
(93)
If $X$ and $Y$ are independent ( $\mathrm{Cov}(X, Y) = 0$ ):
$\mathrm{Var}(aX + bY) = a^2 \mathrm{Var}(X) + b^2 \mathrm{Var}(Y)$
(94)

$\mathrm{Var}(X + Y) = \mathrm{Var}(X) + \mathrm{Var}(Y)$
(95)

$\mathrm{Var}(X - Y) = \mathrm{Var}(X) + \mathrm{Var}(Y)$
(96)
Extension to Multiple Variables ( $X_1, X_2, ..., X_n$ ):
$\mathrm{Var}\left(\sum_{i=1}^n a_i X_i\right) = \sum_{i=1}^n a_i^2 \mathrm{Var}(X_i) + \sum_{i \ne j} a_i a_j \mathrm{Cov}(X_i, X_j)$
(97)
or
$\mathrm{Var}\left(\sum_{i=1}^n a_i X_i\right) = \sum_{i=1}^n a_i^2 \mathrm{Var}(X_i) + 2 \sum_{i < j} a_i a_j \mathrm{Cov}(X_i, X_j)$
(98)
If all $X_i$ are independent:
$\mathrm{Var}\left(\sum_{i=1}^n a_i X_i\right) = \sum_{i=1}^n a_i^2 \mathrm{Var}(X_i)$
(99)

Chapter 12: Functions of Multiple Random Variables¶

Sums of Independent Random Variables (Convolution)¶

Let $X$ and $Y$ be two random variables, and $Z = X+Y$ .

Discrete Case (PMF of Z):
$P(Z=z) = \sum_{k} P(X=k, Y=z-k)$
(100)
If $X$ and $Y$ are independent:
$P(Z=z) = \sum_{k} P(X=k)P(Y=z-k)$
(101)
This is the discrete convolution of the PMFs.
Continuous Case (PDF of Z):
$f_Z(z) = \int_{-\infty}^{\infty} f_{X,Y}(x, z-x)dx$
(102)
If $X$ and $Y$ are independent:
$f_Z(z) = \int_{-\infty}^{\infty} f_X(x)f_Y(z-x)dx = (f_X * f_Y)(z)$
(103)
This is the convolution of the PDFs.

General Transformations (Jacobian Method for PDFs)¶

If $Y_1 = g_1(X_1, X_2)$ and $Y_2 = g_2(X_1, X_2)$ are transformations of random variables $X_1, X_2$ , and these transformations are invertible such that $X_1 = h_1(Y_1, Y_2)$ and $X_2 = h_2(Y_1, Y_2)$ .

Joint PDF of $Y_1, Y_2$ :
$f_{Y_1,Y_2}(y_1, y_2) = f_{X_1,X_2}(h_1(y_1,y_2), h_2(y_1,y_2)) |J|$
(104)
Where $|J|$ is the absolute value of the determinant of the Jacobian matrix.
Jacobian Determinant (J):
$J = \det \begin{pmatrix} \frac{\partial x_1}{\partial y_1} & \frac{\partial x_1}{\partial y_2} \\ \frac{\partial x_2}{\partial y_1} & \frac{\partial x_2}{\partial y_2} \end{pmatrix}$
(105)

Order Statistics¶

Let $X_1, X_2, \dots, X_n$ be $n$ independent and identically distributed (i.i.d.) random variables with CDF $F_X(x)$ and PDF $f_X(x)$ . Let $X_{(1)}, X_{(2)}, \dots, X_{(n)}$ be the order statistics (sorted values).

CDF of the Maximum ( $Y_n = X_{(n)}$ ):
$F_{Y_n}(y) = P(X_{(n)} \le y) = [F_X(y)]^n$
(106)
PDF of the Maximum ( $Y_n = X_{(n)}$ ):
$f_{Y_n}(y) = n[F_X(y)]^{n-1}f_X(y)$
(107)
CDF of the Minimum ( $Y_1 = X_{(1)}$ ):
$F_{Y_1}(y) = P(X_{(1)} \le y) = 1 - [1-F_X(y)]^n$
(108)
PDF of the Minimum ( $Y_1 = X_{(1)}$ ):
$f_{Y_1}(y) = n[1-F_X(y)]^{n-1}f_X(y)$
(109)
PDF of the $k$ -th Order Statistic ( $Y_k = X_{(k)}$ ):
$f_{Y_k}(y) = \frac{n!}{(k-1)!(n-k)!} [F_X(y)]^{k-1} [1-F_X(y)]^{n-k} f_X(y)$
(110)

Chapter 13: The Law of Large Numbers (LLN)¶

Chebyshev’s Inequality¶

For a random variable $X$ with mean $\mu$ and finite variance $\sigma^2$ :

Form 1:
$P(|X - \mu| \ge k\sigma) \le \frac{1}{k^2}$
(111)
(where $k$ is the number of standard deviations)
Form 2:
$P(|X - \mu| \ge \epsilon) \le \frac{\sigma^2}{\epsilon^2}$
(112)
(where $\epsilon > 0$ is any positive number)

Weak Law of Large Numbers (WLLN)¶

For a sequence of i.i.d. random variables $X_1, X_2, \dots, X_n$ with common mean $E[X_i] = \mu$ and common finite variance $Var(X_i) = \sigma^2$ . Let $\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i$ be the sample mean.

Statement: For any $\epsilon > 0$ ,
$\lim_{n \to \infty} P(|\bar{X}_n - \mu| \ge \epsilon) = 0$
(113)
or equivalently,
$\lim_{n \to \infty} P(|\bar{X}_n - \mu| < \epsilon) = 1$
(114)
Formulas used in WLLN proof via Chebyshev’s Inequality:
- Expected Value of Sample Mean:
  $E[\bar{X}_n] = \mu$
  (115)
- Variance of Sample Mean (for i.i.d. variables):
  $Var(\bar{X}_n) = \frac{\sigma^2}{n}$
  (116)
- Application of Chebyshev’s Inequality to $\bar{X}_n$ :
  $P(|\bar{X}_n - \mu| \ge \epsilon) \le \frac{Var(\bar{X}_n)}{\epsilon^2} = \frac{\sigma^2}{n\epsilon^2}$
  (117)

Strong Law of Large Numbers (SLLN)¶

For a sequence of i.i.d. random variables $X_1, X_2, \dots, X_n$ with common mean $E[X_i] = \mu$ .

Statement:
$P\left(\lim_{n \to \infty} \bar{X}_n = \mu\right) = 1$
(118)
(The sample mean converges almost surely to the population mean).

Chapter 14: The Central Limit Theorem (CLT)¶

Statement of CLT (Lindeberg-Lévy CLT): Let $X_1, X_2, \dots, X_n$ be i.i.d. random variables with mean $\mu$ and variance $\sigma^2$ . Let $\bar{X}_n$ be the sample mean.
$Z_n = \frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \xrightarrow{d} N(0, 1)$
(119)
(where $\xrightarrow{d}$ denotes convergence in distribution)
Convergence in Distribution (for $Z_n$ ):
$\lim_{n \to \infty} P(Z_n \le z) = \Phi(z)$
(120)
where $\Phi(z)$ is the CDF of the standard Normal distribution $N(0, 1)$ .
Approximation for Sample Mean $\bar{X}_n$ :
$P(\bar{X}_n \le x) \approx \Phi\left(\frac{x - \mu}{\sigma/\sqrt{n}}\right)$
(121)
CLT for Sums ( $S_n = \sum_{i=1}^{n} X_i$ ): $E[S_n] = n\mu$ , $Var(S_n) = n\sigma^2$ .
$\frac{S_n - n\mu}{\sqrt{n}\sigma} \xrightarrow{d} N(0, 1)$
(122)
Normal Approximation to Binomial Distribution: For $X \sim \text{Binomial}(n, p)$ :
- Mean: $E[X] = np$
- Variance: $Var(X) = np(1-p)$
- Approximation: $X \approx N(np, np(1-p))$ (if $np \ge 5$ and $n(1-p) \ge 5$ is a common rule of thumb).
Continuity Correction (for approximating discrete with continuous):
- To approximate $P(X \le k)$ , use $P(Y \le k + 0.5)$
- To approximate $P(X \ge k)$ , use $P(Y \ge k - 0.5)$
- To approximate $P(X = k)$ , use $P(k - 0.5 \le Y \le k + 0.5)$
- To approximate $P(a \le X \le b)$ , use $P(a - 0.5 \le Y \le b + 0.5)$

Chapter 15: Introduction to Bayesian Inference¶

Bayes’ Theorem for Distributions:
$p(\theta | D) = \frac{p(D | \theta) p(\theta)}{p(D)}$
(123)
Where:
- $p(\theta | D)$ is the posterior probability of parameter $\theta$ given data $D$ .
- $p(D | \theta)$ is the likelihood of data $D$ given parameter $\theta$ .
- $p(\theta)$ is the prior probability of parameter $\theta$ .
- $p(D)$ is the evidence (or marginal likelihood of data).
Proportionality Form:
$\text{Posterior} \propto \text{Likelihood} \times \text{Prior}$
(124)
Evidence Calculation:
- For continuous $\theta$ : $p(D) = \int p(D|\theta) p(\theta) d\theta$
- For discrete $\theta$ : $p(D) = \sum_{\theta} p(D|\theta) p(\theta)$
Beta-Binomial Conjugate Prior Update: If prior is $\text{Beta}(\alpha_{prior}, \beta_{prior})$ and data is $k$ successes in $n$ trials (Binomial likelihood):
- Posterior is $\text{Beta}(\alpha_{posterior}, \beta_{posterior}) = \text{Beta}(\alpha_{prior} + k, \beta_{prior} + n - k)$
Point Estimates from Posterior:
- Maximum a Posteriori (MAP) Estimate:
  $\hat{\theta}_{MAP} = \arg \max_{\theta} p(\theta | D)$
  (125)
  For a Beta $(\alpha, \beta)$ posterior (if $\alpha > 1, \beta > 1$ ):
  $\hat{\theta}_{MAP} = \frac{\alpha - 1}{\alpha + \beta - 2}$
  (126)
- Posterior Mean:
  $\hat{\theta}_{Mean} = E[\theta | D] = \int \theta p(\theta | D) d\theta$
  (127)
  For a Beta $(\alpha, \beta)$ posterior:
  $\hat{\theta}_{Mean} = \frac{\alpha}{\alpha + \beta}$
  (128)
Credible Interval: An interval $[L, U]$ such that:
$P(L \le \theta \le U | D) = \int_L^U p(\theta | D) d\theta = 1 - \gamma$
(129)
(where $1-\gamma$ is the credibility level, e.g., 95%)

Chapter 16: Introduction to Markov Chains¶

Transition Probability (from state $i$ to state $j$ ):
$P_{ij} = P(X_{t+1} = s_j | X_t = s_i)$
(130)
n-Step Transition Probability: The $(i, j)$ -th entry of the matrix $P^n$ (the transition matrix $P$ raised to the power of $n$ ):
$P^{(n)}_{ij} = P(X_{t+n} = s_j | X_t = s_i) = (P^n)_{ij}$
(131)
Stationary Distribution ( $\pi$ ): A row vector $\pi = [\pi_1, \pi_2, ..., \pi_k]$ such that:
$\pi P = \pi$
(132)
and
$\sum_{j=1}^{k} \pi_j = 1$
(133)

Chapter 17: Monte Carlo Methods¶

Estimating Probability $P(A)$ :
$P(A) \approx \frac{N_A}{N}$
(134)
(where $N_A$ is the number of times event A occurred in $N$ simulations)
Estimating Expected Value $E[g(X)]$ :
$E[g(X)] \approx \frac{1}{N} \sum_{i=1}^{N} g(X_i)$
(135)
(where $X_i$ are samples from the distribution of $X$ )
Monte Carlo Integration (Hit-or-Miss for Area):
$\text{Area}(A) \approx \text{Area}(B) \times \frac{N_{hit}}{N}$
(136)
Monte Carlo Integration (Using Expected Values for $I = \int_a^b g(x) dx$ ): If $X \sim \text{Uniform}(a, b)$ :
$I \approx (b-a) \times \frac{1}{N} \sum_{i=1}^{N} g(X_i)$
(137)
Inverse Transform Method for Generating Random Variables: If $U \sim \text{Uniform}(0, 1)$ , then $X = F^{-1}(U)$ has CDF $F(x)$ .
- For Exponential( $\lambda$ ): $F^{-1}(u) = -\frac{1}{\lambda} \ln(1 - u)$ or $F^{-1}(u) = -\frac{1}{\lambda} \ln(u)$ .
Acceptance-Rejection Method: To sample from target PDF $f(x)$ using proposal PDF $g(x)$ where $f(x) \le c \cdot g(x)$ :
1. Sample $y$ from $g(x)$ .
2. Sample $u$ from $\text{Uniform}(0, 1)$ .
3. Accept $y$ if $u \le \frac{f(y)}{c \cdot g(y)}$ .
Buffon’s Needle Problem ( $L \le D$ ): Probability of needle crossing a line:
$P(\text{cross}) = \frac{2L}{\pi D}$
(138)
For $L=1, D=2$ :
$\pi \approx \frac{1}{P(\text{cross})}$
(139)

Chapter 18: (Optional) Further Explorations¶

Entropy $H(X)$ (for discrete random variable $X$ with PMF $p(x)$ ):
$H(X) = - \sum_{x} p(x) \log_b p(x)$
(140)
(Base $b$ is often 2 for bits, or $e$ for nats)
Kullback-Leibler (KL) Divergence (for discrete distributions $P$ and $Q$ ):
$D_{KL}(P || Q) = \sum_{x} P(x) \log_b \frac{P(x)}{Q(x)}$
(141)
Geometric Brownian Motion (GBM) $S(t)$ :
- Stochastic Differential Equation: $dS(t) = \mu S(t) dt + \sigma S(t) dW(t)$
- Solution: $S(t) = S(0) \exp\left( \left(\mu - \frac{\sigma^2}{2}\right)t + \sigma W(t) \right)$
Probability Generating Function (PGF) $G_X(z)$ (for non-negative integer-valued RV $X$ ):
$G_X(z) = E[z^X] = \sum_{k=0}^{\infty} P(X=k) z^k$
(142)
- $E[X] = G'_X(1)$
- $Var(X) = G''_X(1) + G'_X(1) - [G'_X(1)]^2$
Moment Generating Function (MGF) $M_X(t)$ :
$M_X(t) = E[e^{tX}]$
(143)
- $E[X^n] = M_X^{(n)}(0)$ (n-th derivative evaluated at $t=0$ )
- For independent $X, Y$ : $M_{X+Y}(t) = M_X(t) M_Y(t)$