Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

This appendix provides a summary of the key formulas introduced in Chapters 1–8.

Chapter 2: The Language of Probability: Sets, Sample Spaces, and Events

Axioms of Probability

Let SS be a sample space, and P(A)P(A) denote the probability of an event AA.

  1. Non-negativity: For any event AA, the probability of AA is greater than or equal to zero. P(A)0P(A)\ge 0

  2. Normalization: The probability of the entire sample space SS is equal to 1. P(S)=1P(S)=1

  3. Additivity for Disjoint Events: If A1,A2,A3,A_1,A_2,A_3,\dots is a sequence of mutually exclusive (disjoint) events (i.e., AiAj=A_i\cap A_j=\emptyset for all iji\ne j), then the probability of their union is the sum of their individual probabilities.

    P(A1A2A3)=P(A1)+P(A2)+P(A3)+P(A_1\cup A_2\cup A_3\cup \cdots)=P(A_1)+P(A_2)+P(A_3)+\cdots
    • For a finite number of disjoint events, say AA and BB: If AB=A\cap B=\emptyset, then P(AB)=P(A)+P(B)P(A\cup B)=P(A)+P(B)

Basic Probability Rules

  1. Probability Range: For any event AA: 0P(A)10\le P(A)\le 1

  2. Complement Rule: The probability that event AA does not occur is 1 minus the probability that it does occur. P(Ac)=1P(A)P(A^c)=1-P(A)

  3. Addition Rule (General): For any two events AA and BB (not necessarily disjoint), the probability that AA or BB (or both) occurs is:

    P(AB)=P(A)+P(B)P(AB)P(A\cup B)=P(A)+P(B)-P(A\cap B)

Empirical Probability

The empirical probability of an event AA is estimated from simulations:

Pempirical(A)=Number of times event A occurredTotal number of trials.P_{\text{empirical}}(A)=\frac{\text{Number of times event $A$ occurred}}{\text{Total number of trials}}.

Chapter 3: Counting Techniques: Permutations and Combinations

The Multiplication Principle

If a procedure can be broken down into a sequence of kk steps, with n1n_1 ways for the first step, n2n_2 for the second, \dots, nkn_k for the kk-th step, then the total number of ways to perform the entire procedure is:

Total ways=n1×n2××nk.\text{Total ways}=n_1\times n_2\times \cdots \times n_k.

Permutations (Order Matters)

  1. Permutations without Repetition: The number of permutations of nn distinct objects taken kk at a time:

    P(n,k)=n!(nk)!P(n,k)=\frac{n!}{(n-k)!}
    • Special Case: Arranging all nn distinct objects: P(n,n)=n!P(n,n)=n!

  2. Permutations with Repetition (Multinomial Coefficients): The number of distinct permutations of nn objects where there are n1n_1 identical objects of type 1, n2n_2 of type 2, \dots, nkn_k of type kk (such that n1+n2++nk=nn_1+n_2+\cdots+n_k=n):

    n!n1!,n2!nk!\frac{n!}{n_1!,n_2!\cdots n_k!}

Combinations (Order Doesn’t Matter)

  1. Combinations without Repetition: The number of combinations of nn distinct objects taken kk at a time (also “nn choose kk”):

    C(n,k)=(nk)=n!k!(nk)!C(n,k)=\binom{n}{k}=\frac{n!}{k!(n-k)!}
    • Relationship to permutations:

      C(n,k)=P(n,k)k!C(n,k)=\frac{P(n,k)}{k!}
  2. Combinations with Repetition: The number of combinations with repetition of nn types of objects taken kk at a time:

    (n+k1k)=(n+k1)!k!(n1)!\binom{n+k-1}{k}=\frac{(n+k-1)!}{k!(n-1)!}

Probability with Equally Likely Outcomes

The probability of an event EE when all outcomes in the sample space SS are equally likely:

P(E)=Number of outcomes favorable to ETotal number of possible outcomes in S=ES.P(E)=\frac{\text{Number of outcomes favorable to }E}{\text{Total number of possible outcomes in }S} =\frac{|E|}{|S|}.

Chapter 4: Conditional Probability

Definition of Conditional Probability

For any two events AA and BB from a sample space SS, where P(B)>0P(B)>0, the conditional probability of AA given BB is defined as:

P(AB)=P(AB)P(B).P(A\mid B)=\frac{P(A\cap B)}{P(B)}.

The Multiplication Rule for Conditional Probability

Rearranging the definition of conditional probability gives:

P(AB)=P(AB)P(B).P(A\cap B)=P(A\mid B)P(B).

Similarly, if P(A)>0P(A)>0:

P(AB)=P(BA)P(A).P(A\cap B)=P(B\mid A)P(A).

For three events A,B,CA,B,C:

P(ABC)=P(CAB),P(BA),P(A).P(A\cap B\cap C)=P(C\mid A\cap B),P(B\mid A),P(A).

The Law of Total Probability

Let B1,B2,,BnB_1,B_2,\dots,B_n be a partition of the sample space SS. Then, for any event AA in SS:

P(A)=i=1nP(ABi),P(Bi).P(A)=\sum_{i=1}^{n} P(A\mid B_i),P(B_i).

Expanded form:

P(A)=P(AB1)P(B1)+P(AB2)P(B2)++P(ABn)P(Bn).P(A)=P(A\mid B_1)P(B_1)+P(A\mid B_2)P(B_2)+\cdots+P(A\mid B_n)P(B_n).

Chapter 5: Bayes’ Theorem and Independence

Bayes’ Theorem

Provides a way to “reverse” conditional probabilities. If P(B)>0P(B)>0:

P(AB)=P(BA)P(A)P(B).P(A\mid B)=\frac{P(B\mid A)P(A)}{P(B)}.

Where P(B)P(B) can often be calculated using the Law of Total Probability (e.g., with a partition A,Ac{A,A^c}):

P(B)=P(BA)P(A)+P(BAc)P(Ac).P(B)=P(B\mid A)P(A)+P(B\mid A^c)P(A^c).

Independence of Events

  1. Formal Definition: Events AA and BB are independent if and only if:

    P(AB)=P(A)P(B)P(A\cap B)=P(A)P(B)
  2. Alternative Definition (using conditional probability):

    • If P(B)>0P(B)>0, AA and BB are independent if and only if: P(AB)=P(A)P(A\mid B)=P(A)

    • Similarly, if P(A)>0P(A)>0, independence means: P(BA)=P(B)P(B\mid A)=P(B)

Conditional Independence

Notation:

ABCmeans “A and B are conditionally independent given C.”A \perp B \mid C \quad \text{means “$A$ and $B$ are conditionally independent given $C$.”}

Definition (with P(C)>0P(C)>0):

ABC    P(ABC)=P(AC)P(BC).A \perp B \mid C \iff P(A\cap B\mid C)=P(A\mid C) P(B\mid C).

Equivalent “no extra information” form: if P(BC)>0P(B\cap C)>0, then

ABC    P(ABC)=P(AC).A \perp B \mid C \iff P(A\mid B\cap C)=P(A\mid C).

Likewise, if P(AC)>0P(A\cap C)>0, then

P(BAC)=P(BC).P(B\mid A\cap C)=P(B\mid C).

Chapter 6: Discrete Random Variables

Probability Mass Function (PMF)

For a discrete random variable XX, the PMF pX(x)p_X(x) is:

pX(x)=P(X=x)p_X(x)=P(X=x)

Properties of a PMF:

  1. pX(x)0p_X(x)\ge 0 for all possible values xx.

  2. xpX(x)=1\sum_x p_X(x)=1

    (sum over all possible values xx).

Cumulative Distribution Function (CDF)

For a random variable XX, the CDF FX(x)F_X(x) is:

FX(x)=P(Xx)F_X(x)=P(X\le x)

For a discrete random variable XX:

FX(x)=kxpX(k)F_X(x)=\sum_{k\le x} p_X(k)

Properties of a CDF:

  1. 0FX(x)10\le F_X(x)\le 1 for all xx.

  2. If a<ba<b, then FX(a)FX(b)F_X(a)\le F_X(b) (non-decreasing).

  3. limxFX(x)=0\lim_{x\to -\infty} F_X(x)=0
  4. limx+FX(x)=1\lim_{x\to +\infty} F_X(x)=1
  5. P(X>x)=1FX(x)P(X>x)=1-F_X(x)

  6. P(a<Xb)=FX(b)FX(a)P(a<X\le b)=F_X(b)-F_X(a) for a<ba<b.

  7. P(X=x)=FX(x)limyxFX(y)P(X=x)=F_X(x)-\lim_{y\to x^-}F_X(y)

    (for a discrete RV, this is the jump at xx).

Expected Value (Mean)

For a discrete random variable XX:

E[X]=μX=xxpX(x)E[X]=\mu_X=\sum_x x\cdot p_X(x)

Variance

For a random variable XX with mean μX\mu_X:

Var(X)=σX2=E[(XμX)2]\operatorname{Var}(X)=\sigma_X^2=E[(X-\mu_X)^2]

For a discrete random variable XX:

Var(X)=x(xμX)2pX(x)\operatorname{Var}(X)=\sum_x (x-\mu_X)^2\cdot p_X(x)

Computational formula for variance:

Var(X)=E[X2](E[X])2\operatorname{Var}(X)=E[X^2]-(E[X])^2

Where E[X2]E[X^2] for a discrete random variable is:

E[X2]=xx2pX(x)E[X^2]=\sum_x x^2\cdot p_X(x)

Standard Deviation

The positive square root of the variance:

SD(X)=σX=Var(X)SD(X)=\sigma_X=\sqrt{\operatorname{Var}(X)}

Functions of a Random Variable

If Y=g(X)Y=g(X):

  1. PMF of YY (for discrete XX):

    pY(y)=P(Y=y)=P(g(X)=y)=x:,g(x)=ypX(x)p_Y(y)=P(Y=y)=P(g(X)=y)=\sum_{x:,g(x)=y} p_X(x)
  2. Expected Value of Y=g(X)Y=g(X) (LOTUS - Law of the Unconscious Statistician): For a discrete random variable XX:

    E[Y]=E[g(X)]=xg(x)pX(x)E[Y]=E[g(X)]=\sum_x g(x)\cdot p_X(x)

Chapter 7: Common Discrete Distributions

Bernoulli Distribution

Models a single trial with two outcomes (success=1, failure=0). Parameter: pp (probability of success).

Binomial Distribution

Models the number of successes in nn independent Bernoulli trials. Parameters: nn (number of trials), pp (probability of success on each trial).

Geometric Distribution

Models the number of trials (kk) needed to get the first success. Parameter: pp (probability of success on each trial).

Negative Binomial Distribution

Models the number of trials (kk) needed to achieve rr successes. Parameters: rr (target number of successes), pp (probability of success on each trial).

Poisson Distribution

Models the number of events occurring in a fixed interval of time or space. Parameter: λ\lambda (average number of events in the interval).

Hypergeometric Distribution

Models the number of successes in a sample of size nn drawn without replacement from a finite population of size NN containing KK successes. Parameters: NN (population size), KK (total successes in population), nn (sample size).

Chapter 8: Continuous Random Variables

Probability Density Function (PDF)

For a continuous random variable XX, the PDF fX(x)f_X(x) describes the relative likelihood of XX. Properties of a PDF:

  1. fX(x)0f_X(x)\ge 0 for all xx.

  2. fX(x)dx=1\int_{-\infty}^{\infty} f_X(x)\,dx = 1 (total area under curve is 1).

  3. P(aXb)=abfX(x)dxP(a\le X\le b)=\int_a^b f_X(x)\,dx.

  4. For any specific value cc: P(X=c)=ccfX(x)dx=0P(X=c)=\int_c^c f_X(x)\,dx=0.

Cumulative Distribution Function (CDF)

For a continuous random variable XX, the CDF FX(x)F_X(x) is:

FX(x)=P(Xx)=xfX(t),dtF_X(x)=P(X\le x)=\int_{-\infty}^{x} f_X(t),dt

Properties of a CDF:

  1. FX(x)F_X(x) is non-decreasing.

  2. limxFX(x)=0\lim_{x\to -\infty} F_X(x)=0
  3. limxFX(x)=1\lim_{x\to \infty} F_X(x)=1
  4. P(a<Xb)=FX(b)FX(a)P(a<X\le b)=F_X(b)-F_X(a).

  5. fX(x)=ddxFX(x)f_X(x)=\frac{d}{dx}F_X(x)

    (where the derivative exists).

Expected Value (Mean)

For a continuous random variable XX:

E[X]=μ=xfX(x),dxE[X]=\mu=\int_{-\infty}^{\infty} x f_X(x),dx

Variance

For a continuous random variable XX with mean μ\mu:

Var(X)=σ2=E[(Xμ)2]=(xμ)2fX(x),dx\operatorname{Var}(X)=\sigma^2=E[(X-\mu)^2]=\int_{-\infty}^{\infty} (x-\mu)^2 f_X(x),dx

Computational formula:

Var(X)=E[X2](E[X])2\operatorname{Var}(X)=E[X^2]-(E[X])^2

Where

E[X2]=x2fX(x),dx.E[X^2]=\int_{-\infty}^{\infty} x^2 f_X(x),dx.

Standard Deviation

The positive square root of the variance:

σ=Var(X)\sigma=\sqrt{\operatorname{Var}(X)}

Percentiles and Quantiles

The pp-th percentile xpx_p is the value such that FX(xp)=P(Xxp)=pF_X(x_p)=P(X\le x_p)=p. The quantile function Q(p)Q(p) is the inverse of the CDF:

Q(p)=FX1(p)=xp.Q(p)=F_X^{-1}(p)=x_p.

Functions of a Continuous Random Variable

If Y=g(X)Y=g(X):

  1. CDF of YY:

    FY(y)=P(Yy)=P(g(X)y)F_Y(y)=P(Y\le y)=P(g(X)\le y)
  2. PDF of YY (Change of Variables Formula): If g(x)g(x) is monotonic with inverse x=g1(y)x=g^{-1}(y), then:

    fY(y)=fX(g1(y))dxdyf_Y(y)=f_X(g^{-1}(y))\left|\frac{dx}{dy}\right|
  3. Expected Value of Y=g(X)Y=g(X) (LOTUS):

    E[Y]=E[g(X)]=g(x)fX(x),dxE[Y]=E[g(X)]=\int_{-\infty}^{\infty} g(x)f_X(x),dx

Chapter 9: Common Continuous Distributions

1. Uniform Distribution

X∼U(a,b)

2. Exponential Distribution

T∼Exp(λ)

3. Normal (Gaussian) Distribution

X∼N(μ,σ2)

4. Gamma Distribution

X∼Gamma(k,λ) (using shape k and rate λ) or X∼Gamma(k,θ) (using shape k and scale θ=1/λ)
The Gamma function is Γ(k)=∫0∞xk−1e−xdx. For positive integers k, Γ(k)=(k−1)!.

5. Beta Distribution

X∼Beta(α,β)
The Beta function is B(α,β)=∫01tα−1(1−t)β−1dt=Γ(α+β)Γ(α)Γ(β).

Chapter 10: Joint Distributions

Joint Probability Mass Functions (PMFs)

For two discrete random variables XX and YY:

Joint Probability Density Functions (PDFs)

For two continuous random variables XX and YY:

Marginal Distributions

Conditional Distributions

Joint Cumulative Distribution Functions (CDFs)

Chapter 11: Independence, Covariance, and Correlation

Independence of Random Variables

Two random variables XX and YY are independent if for any sets AA and BB:

P(XA,YB)=P(XA)P(YB)P(X \in A, Y \in B) = P(X \in A) P(Y \in B)

This is equivalent to:

Covariance

The covariance between two random variables XX and YY:

Correlation Coefficient

The Pearson correlation coefficient between two random variables XX and YY:

Variance of Sums of Random Variables

For any two random variables XX and YY, and constants aa and bb:

Chapter 12: Functions of Multiple Random Variables

Sums of Independent Random Variables (Convolution)

Let XX and YY be two random variables, and Z=X+YZ = X+Y.

General Transformations (Jacobian Method for PDFs)

If Y1=g1(X1,X2)Y_1 = g_1(X_1, X_2) and Y2=g2(X1,X2)Y_2 = g_2(X_1, X_2) are transformations of random variables X1,X2X_1, X_2, and these transformations are invertible such that X1=h1(Y1,Y2)X_1 = h_1(Y_1, Y_2) and X2=h2(Y1,Y2)X_2 = h_2(Y_1, Y_2).

Order Statistics

Let X1,X2,,XnX_1, X_2, \dots, X_n be nn independent and identically distributed (i.i.d.) random variables with CDF FX(x)F_X(x) and PDF fX(x)f_X(x). Let X(1),X(2),,X(n)X_{(1)}, X_{(2)}, \dots, X_{(n)} be the order statistics (sorted values).

Chapter 13: The Law of Large Numbers (LLN)

Chebyshev’s Inequality

For a random variable XX with mean μ\mu and finite variance σ2\sigma^2:

Weak Law of Large Numbers (WLLN)

For a sequence of i.i.d. random variables X1,X2,,XnX_1, X_2, \dots, X_n with common mean E[Xi]=μE[X_i] = \mu and common finite variance Var(Xi)=σ2Var(X_i) = \sigma^2. Let Xˉn=1ni=1nXi\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i be the sample mean.

Strong Law of Large Numbers (SLLN)

For a sequence of i.i.d. random variables X1,X2,,XnX_1, X_2, \dots, X_n with common mean E[Xi]=μE[X_i] = \mu.

Chapter 14: The Central Limit Theorem (CLT)

Chapter 15: Introduction to Bayesian Inference

Chapter 16: Introduction to Markov Chains

Chapter 17: Monte Carlo Methods

Chapter 18: (Optional) Further Explorations