Set \( \Omega \) of all outcomes that can occur
'Measurable' subset of the sample space \(E \subseteq \Omega\)
\(E \text{ is an event } \iff E \subseteq \Omega\)
\(E \text{ occurs } \iff \exists e \in E ( e \text{ occurs} )\)
Quantity stating how likely an event is. A probability function \(\text{Pr}\) that maps events to probabilites typically obeying the following properties
Ratio repsesenting frequency of an event occurence on the sample space occurence, called relative frequency
\( \text{Pr}(E) = \mathrm{plim}_{n \to \infty} \frac{\sum^{n}_{k=1} E_{k} }{n}\)
Assuming all outcomes have equal probability, we have the following for the discrete case
\( \text{Pr}(E) = \frac{|E|}{|\Omega|}\)
Probability of an event given that some other event is known to occur.
\( \text{Pr}(E_1|E_2) = \frac{\text{Pr}(E_1 \cap E_2)}{\text{Pr}(E_2)|} \)
Mathematically, conditional probability means restricting the sample space to the known event.
\( \text{Pr}(E_1|E_2) = \frac{\text{Pr}(E_2|E_1) \text{Pr}(E_1)}{\text{Pr}(E_2)} \)
For situations such as circular arrangements witout repetition and situations where the first event doesn't matter (like 'what is the probability of rolling the same number on a 6 sided die 3 times in a row?'; it doesn't matter what we get on the first throw, but the second and third times have importance), the degree of freedom is \(n-1\), meaning that circular arrangements use the formula \((n-1)!\) and that dice question yields the result \(\frac{1}{6^2}\). Basically we're arbitrarily guaranteeing the placement of some restricted element
Relationship between a pair of events such that an event occuring has no affect the other event occuring
\( A,B \text{ are independent } \iff \text{Pr}(A\cap B)=\text{Pr}(A) \times \text{Pr}(B) \iff \text{Pr}(A|B)=\text{Pr}(A)\)
Process that results in some unpredictable outcome
Function that maps all outcomes in the sample space to a result in the state space, a state space being a numerical representation for an outcome.
Essentially, RVs map outcomes to numerical representations called results that can be analyzed mathematically.
\(X : \Omega \to R\)
\(X(\omega)\)
Probability functions should technically map events to probabilities, however when dealing with random variables, a common abuse of notation uses a predicate that represents the events that satisfy it
\( \text{Pr}[P(X )] = \text{Pr}( \{ \omega \in \Omega : P[X(\omega)] \} ) \)
Function \(f\) that maps outcomes of a discrete RV \(X\) to its probability
\(f_{X}(n) = \text{Pr}(X=n)\)
Function that returns the probability for all discrete outcomes less than some \(n\)
\( F_{X}(n) = \text{Pr}(X \leq n) = \sum_{k = -\infty}^{n} f_{X}(k) \)
Function \(f\) that maps outcomes of a continuous random variable \(X\) to a 'probability density'
Note that this 'density' is NOT a probability (since in compact sets of outcomes, \(\text{Pr}(X=x)=0\)), Probability for sets of outcomes must be handled by the CDF (see below).
\(f_{X}(x)\)
Function that takes a continuous random variable returns the probability that an event is less than or equal to that variable
\( \displaystyle F_{X}(x) = \text{Pr}(X \leq x) = \int_{-\infty}^{x} f_{X}(t) dt \)
Quantity representing expected outcome for a random variable as a weighted sum, where outcomes are weighed by probabilities.
\( \displaystyle \text{E}(X) = \int_{\Omega} X(\omega) d\text{Pr}(\omega)\)
\( \displaystyle \text{E}(X) = \sum_{n \in \Omega} n f_{X}(n)\)
\( \displaystyle \text{E}(X) = \int_{\Omega} xf_{X}(x) dx\)
RV of the expectation of \(X\) given the result of \(Y\)
\( \text{E}(X|Y) : \Omega \to R\)
\( \text{E}(X|Y=\omega) \)
Expected square distance from the average for a random variable
\( \text{Var}(X) = \text{E}([X- \text{E}(X)]^2) = \text{E}(X^2) - \text{E}(X)^2 \)
Class of random variable defined by a certain structure of PMF/PDF
Discrete distribution of an event with binary outcomes; success or failure
\(X \sim \text{Bern} (p)\)
\( f_{X}(n) = \begin{cases} p & n=1 \\ 1-p & n=0 \\ 0 \end{cases} \)
Discrete distribution of a series (linear combination) of \(n\) independent Bernoulli events
\(X \sim \text{Bin}(n,p)\)
\(\displaystyle f_{X}(k)= \begin{cases} \binom{n}{k} p^{k} (1-p)^{n-k} & k \in [0,n] \cap \mathbb{N} \\ 0 \end{cases}\)
Discrete distribution of the number of independent Bernoulli trials needed for the first success
\(X \sim \text{Geo}(p)\)
\( f_{X}(n) = \begin{cases} (1-p)^{n-1}p & n \in \mathbb{N} \\ 0 \end{cases} \)
Continuous distribution where any group of equally sized ranges of the sample have equal probabilities
\(X \sim \text{U}(a,b)\)
\(f_{X}(x) = \begin{cases} \frac{1}{b-a} & x \in [a,b] \\ 0 & x \notin [a,b] \end{cases}\)
Law describing how to calculate the probability of \(A\) through conditional probabilities given that the disjoint union of the given events equals the sample space
\( \text{Pr}(A) = \sum_{n} \text{Pr}(A|E_n) \text{Pr}(E_n) \)
Law asserting that the expected value of \(X\) is the expected value of the conditional expected value of \(X|Y=y\)
\( \text{E}(X) = \text{E}(\text{E}(X|Y)) \)
\( \text{E}(\text{E}(X|Y)) = \sum_{n \in \text{Im}(Y) } \text{E}(X|Y=n) \text{Pr}(Y=n) \)
Law describing how to calculate the expected value of \(X\) through the conditional expected value of \(X|Y=y\)
\( \text{Var}(X) = \text{E}(\text{Var}(X|Y)) + \text{Var}(\text{E}(X|Y)) \)
A function \(u\) in terms of a random variable \(X\) can become its own random variable \(Y=u(X)\), and its PMF/PDF can be found when:
\(u: X \to Y \text{ is injective}\)
When \(X\) is discrete, form the PMF of \(Y=u(X)\) by mapping each of the original probabilities to be the transformed value probabilities
\(u: X \to Y \text{ is injective} \implies f_{Y}(u(k)) = f_{X}(k) \)
\(f_{Y}(k) = \sum_{i\in X : u(k)=u(i)} f_{X}(i) \)
When \(X\) is continuous, the concept remains, however one forms the PDF by integration by substitution
\(u: X \to Y \text{ is injective} \land \text{Pr}(X \in [a,b]) = \text{Pr}(Y \in [u(a),u(b)]) \implies \int^{b}_{a} f(x)dx = \int^{u(b)}_{u(a)} f(u^{-1}(y))(u^{-1})'(y) dy\)
Discrete distribution where there is a known average of amount of events per interval of time or space, and is independent
\(X \sim \text{Pois}(\lambda)\)
\( f_{X}(n) = \begin{cases} \frac{e^{-\lambda}\lambda^{n}}{n!} & n \in \mathbb{N} \\ 0 \end{cases} \)
Poisson distribution works on taking the infinite limit of a binomial distribution \(\lim_{N \to \infty} X \sim \text{Bin}(N,\frac{\lambda}{N})\), since we assume N is the interval of space and we make it approach infinity to simulate the continuous nature of time and space, though we are still counting discrete events
\( X \sim \text{Pois}(\lambda) \land Y \sim \text{Pois}(\mu) \implies X+Y \sim \text{Pois}(\lambda + \mu)\)
To get a PMF for X+Y, we can interpret the probability of \(k\) as the sum of the different ways each distribution in unison can produce \(k\), for instance \(\text{Pr}(X+Y = k) = \sum_{j=0}^{k} \text{Pr}(X=j)\text{Pr}(Y=k-j) \). Then by substituting the PMFs of X and Y into this, we can discover \( X+Y \sim \text{Pois}(\lambda + \mu) \)
The PMF of \(X+Y\) is \(\text{Pr}(X+Y = k) = \sum_{j=0}^{k} \text{Pr}(X=j)\text{Pr}(Y=k-j) \). Under assumption that \(X\) and \(Y\) are Poisson distributed, substituring their Poisson PMFs shows a PMF of Poisson form, hence \( X+Y \sim \text{Pois}(\lambda + \mu) \)
Assuming a distribution \(X \sim \text{Pois}(\lambda)\) that has two types of events \( X_1 , X_2 \) with probabilities \(p,1-p\) respectively. This implies \(X_1 \sim \text{Pois}(\lambda p)\)
We want to find \(\text{Pr}(X_1 = n)\), and we can do this by doing \(\text{Pr}(X_1 = n) = \sum \text{Pr}(X_1 = n| X = n+m) \text{Pr}(X = n+m)\). We can see that \(X_1,X = n+m \sim \text{Bin}(n+m,p)\) since out of all the events \(n+m\) we have probability \(p\) of geting an \(n\) type value. We recall that \(\text{Pr}(X = n+m) = \frac{e^{-\lambda}\lambda^{(n+m)}}{(n+m)!}\). Substituting these will show that our theory is true
We can take \(X(t) \sim \text{Pois}(\lambda t)\) to extend a poisson distribution to larger intervals, for instance, if \(\lambda\) represents the average amount of events for an h
Distribution property such that a given conditional lower bound \(a\) for the random variable's return value translates the PDF \(a\) units backwards; it forgets its distance between Poisson events when it is a given condition
\(X \text{ is memoryless} \iff \text{Pr}(X\gt b|X \gt a)=\text{Pr}(X\gt b-a)\)
Distribution property such that all outcomes within the same distance of some 'symmetric outcome' \(x_0\) have the same probability
\(X \text{ is symmetric } \iff \exists x_0 : \text{Pr}(X = x_0 + \delta)=\text{Pr}(X = x_0 - \delta)\)
Continuous, memoryless distribution representing the distance between poisson events
\(X \sim \text{exp}(\lambda)\)
\(f_{X}(x) = \begin{cases} \lambda e^{-\lambda x} & x \in [0,\infty ) \\ 0 \end{cases}\)
Generating functions are ways of encoding a function in the form of some power series. For a random variable \(X\), we take the following generating function:
\(G(z)= \text{E}(z^{X})\)
\(\text{E}(X) = G'(1) \)
\(\text{Var}(X) = G''(1)-G'(1)+G'(1)^2 \)
Generating functions can be split as such
\(G_X(z)= G_{X_{1}}(z) G_{X_{2}}(z)\)
Applying conditional expectation with probability generating functions implies
\(G_{S_{N}}(z)= G_{N}(G_{X_{i}}(z))\)
\(X \sim \text{exp}(\lambda) \implies G(z) = \frac{\lambda}{\lambda - \ln (z)}\)
\(X \sim \text{Bern}(p) \implies G(z) = p(z-1)+1\)
\(X \sim \text{Bin}(n,p) \implies G(z) = (p(z-1)+1)^n \)
\(X \sim \text{Geo}(p) \implies G(z) = \frac{zp}{z(1-p)-1} \)
\(X \sim \text{Pois}(\lambda) \implies G(z) = e^{\lambda(z-1)} \)
Discrete series of datapoints recording numerical observations at regular time intervals.
Sequence of random variables \(X : \Omega \times D \to R\) (where \(R\) must be countable) is a Markov chain if the probabilities for the subsequent random variable in the sequence are determined solely by the previous random variable
\( (X_t) \text{ is a Markov chain } \iff \text{Pr}(X_n = k_n | \bigwedge_{i=1}^{n-1} X_i = k_i) = \text{Pr}(X_n = k_n | X_{n-1}=k_{n-1})\)
Markov chain defined on the integers \(\mathbb{Z}\) such that \(\text{Pr}_{ij}\) reprsents the probability of the \(n+1\)th state being \(j\), given that the current state is \(i\)
\( \text{Pr}_{ij} = \text{Pr}(X_{n+1}= j | X_{n} = i) \)
Random walk where one event is more probable than another
A recursive sequence that relates a present event to a previous event. It is desirable to find some closed form solution for these
A difference equation such that the highest and lowest order terms differ by 1
\( D_n = aD_{n-1}\)
\( D_n = D_{0} a^{n}\)
Use some recurrence relation to find some closed form for \(C_n\)
\( D_n = aD_{n-1} + bD_{n-2}\)
\( D_n = D_{0} k^{n}\)
A difference equation such that the highest and lowest order terms differ by 2
For difference equations where the highest and lowest term differ by two
Given \(X_t =x_i\), Probability of transitioning to \(x_j\) at time \(s\)
\(\text{Pr} _{ij}(t,s) = \text{Pr} (X_{t} = x_j | X_{s} = x_i)\)
Transition probability that depends only on the time-distance between the time indexes conditioned RV and observed RV
\(\text{Pr} _{ij}(h) = \text{Pr} (X_{t+h} = x_j | X_{t} = x_i)\)
Vector representing each of the probabilities of a Markov chain realizing each value at time \(t\)
\( \textbf{p}(t) = \begin{bmatrix} \text{Pr}(X_t =x_1 ) \\ \text{Pr}(X_t =x_2 ) \\ \cdots \\ \text{Pr}(X_t =x_n ) \end{bmatrix} \)
\( \sum_{i=1}^{n} p_i = 1\)
Square matrix of all homogeneous transition probabilities
\( \textbf{P}(s) = \begin{bmatrix} \text{Pr}_{11}(s) & \text{Pr}_{12}(s) & \cdots & \text{Pr}_{1n}(s) \\ \text{Pr}_{21}(s) & \text{Pr}_{22}(s) & \cdots & \text{Pr}_{2n}(s) \\ \vdots & \vdots & \ddots & \vdots \\ \text{Pr}_{n1}(s) & \text{Pr}_{n2}(s) & \cdots & \text{Pr}_{nn}(s) \end{bmatrix} \)
\(\sum_{j=1}^{n} \text{Pr}_{ij}(s) = 1 \)
\( \textbf{P}^{T}(s) \textbf{p}(t) = \textbf{p}(t+s)\)
Representing possible values as state nodes with directional edges defined by each element in the matrix
State quality such that the Markov chain will never leave that state once that state is reached
\( x_i \text{ is absorbing } \iff \text{Pr}_{ii}(s)=1\)
Pair of states where one may transition to the other given some time
\( x_j \leftarrow x_i \iff \exists s ( \text{Pr}_{ij}(s) \gt 0 ) \)
States that can access one another
\( x_j \leftrightarrow x_i \iff x_j \leftarrow x_i \land x_i \leftarrow x_j\)
Markov chain where all states communicate
\((X_t) \text{ is irreducible } \iff \forall i,j ( x_j \leftrightarrow x_i )\)
Markov chain with all states aperiodic and persistent, alternatively, in the limit the probability of being in each state approaches an equilibrium distribution that is independent of the initial probabilities
\( (X_t) \text{ is ergodic } \iff \forall i [ x_i \text{ is aperiodic and persistent} ] \)
\( (X_t) \text{ is ergodic } \iff \exists \boldsymbol{\pi} ( \forall \textbf{p}(0) [ \lim_{t \to \infty} \textbf{p}(t) = \boldsymbol{\pi} ] ) \)
\( (X_t) \text{ is ergodic } \implies \exists! \boldsymbol{\pi} ( \lim_{t \to \infty} \textbf{p}(t) = \boldsymbol{\pi} ) \)
\( (X_t) \text{ is a Markov chain with finite states } \land \exists s_0 [ \min_{i,j \geq 1} \text{Pr}_{ij}(s_0) \gt 0 ] \implies (X_t) \text{ is ergodic } \land \)
State quality such that the state diagram will definitely cicle back to that state after ample time
\( x_i \text{ is recurrent } \iff \text{Pr}(\exists s [ X_{t+s}=X_{t}=x_i]) = 1\)
State quality such that there exists a non-zero probability that the Markov chain will never return to that state
\( x_i \text{ is transient } \iff \text{Pr}(\exists s [ X_{t+s}=X_{t}=x_i]) \lt 1\)
A state has a period \(d\) when \(d\) is the GCD of the length of all possible amount of transitions that can possibly lead back to that state
\( i \text{ has period } k_i \iff k_i =\gcd \{ n \gt 0 : \text{Pr}( X_n = i | X_0 = i )\} \)
\( x_i \text{ has period } k_i \iff [ \text{Pr}_{ii}(s)=0 \iff \neg ( s|k_i ) ]\)
\( x_i \text{ is aperiodic } \iff k_i = 1 \)
For an infinite amount of events, each of the probability vector elements converge to some element
\(\lim_{t \to \infty} \textbf{p}(t) = \boldsymbol{\pi}\)
\( \textbf{P}^{T} \boldsymbol{\pi} = \boldsymbol{\pi} \)