Here we will redefine the environment in which we understant aspects of probability theory.
Generating function related to an RV (assuming it exists)
\(M_X (r) = \text{E}(e^{rX}) \)
Generating function that exists for all RVs
\(\varphi_X (r) = \text{E}(e^{irX}) \)
\( \text{E}(X^{k}) = \frac{1}{i^k} \varphi_{X}^{(k)}(0) \)
\( M_{\textbf{x}} ( \textbf{u}) = \text{E}( e^{\textbf{u} \cdot \textbf{x}} ) \)
\( \varphi_{\textbf{x}} ( \textbf{u}) = \text{E}( e^{i\textbf{u} \cdot \textbf{x}} ) \)
On a probability space \( (\Omega, \mathcal{F}, \text{Pr}) \),
\( \textbf{X} = \begin{bmatrix} X_{1,1} & X_{1,2} & \cdots & X_{1,n} \\ X_{2,1} & X_{2,2} & \cdots & X_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ X_{m,1} & X_{m,2} & \cdots & X_{m,n} \end{bmatrix}\)
\(\text{E}(X^2) \lt \infty \implies \min_{g} ( \text{E} ( [ X - g(Y) ]^2)) = \text{E}(X|Y) \)
\( \textbf{x} \sim \text{N}(\textbf{m} , \textbf{L}\textbf{L}^{T} ) \iff \textbf{x} = \textbf{m} + \textbf{L}\boldsymbol{\xi}\)
\( \textbf{x} \sim \text{N}(\textbf{m} , \textbf{L}\textbf{L}^{T} ) \iff M_\textbf{x} (\textbf{u}) = e^{\textbf{m} \cdot \textbf{u} + \frac{1}{2}\textbf{u} \cdot \textbf{Q}\textbf{u} } \)
\( \mathbf{x} \sim \text{N}(\textbf{m} , \textbf{L}\textbf{L}^{T} ) \iff \varphi_\textbf{x} (\textbf{u}) = e^{ i\textbf{m} \cdot \textbf{u} - \frac{1}{2}\textbf{u} \cdot \textbf{Q}\textbf{u} } \)
\(\textbf{x},\textbf{y} \text{ are Gaussian vector RVs} \land \text{cov}(\textbf{x},\textbf{y}) \text{ is positive definite} \implies\)
\( \text{plim}_{n \to \infty} X_n = X \iff \forall \varepsilon \in \mathbb{R}_{+} \setminus \{0\}[ \lim_{n \to \infty} \text{Pr}( |X_n -X| \geq \varepsilon ) = 0 ] \)
\( \text{plim}_{n \to \infty} X_n = X \iff \forall \varepsilon \in \mathbb{R}_{+} \setminus \{0\} [ \lim_{n \to \infty} \text{Pr}( |X_n -X| \lt \varepsilon ) = 1 ] \)
\( \lim_{n \to \infty} X_n = X \text{ almost surely }\iff \text{Pr}( \lim_{n \to \infty} X_n = X ) = 1 \)
\( \lim_{n \to \infty} X_n = X \text{ in mean square } \iff \lim_{n \to \infty} \text{E}(|X_n - x|^2) = 0 \)
See MS
MC estimator that uses LLN to form a basic estimator of \(\text{E}(f(X))\)
\(F_n = \frac{1}{n}\sum^{n}_{i=1} f(x_i)\)
MC estimator that uses a control variate to minimize variance of the estimator (leading to higher accuracy)
\( G_n = F_n + a(Q_n - Q) \)
By considering \(\frac{d}{da}[\text{Var}(G_n)]=0\)
\( \text{arg min}_{a \in \mathbb{R} } [ \text{Var}(F_n +a(Q_n -Q)) ] = -\frac{\text{cov}(f(U),q(U))}{\text{Var}(q(U))}\)
When choosing such an optimized \(a\), the variance of the optimized control variate estimator can be shown to be
\( \text{Var}(G_n) = \frac{1 - \rho_{f(U),q(U)}^2 }{n}\text{Var}(F_n)\)
Since \( 1 - \rho_{f(U),q(U)}^2 \in [0,1] \), this allows for the variance to be reduced!
MC estimator that employs an crude MC estimator for the expectation of a second identically distributed RV that has negative covariance with the first RV; this condition reduces the variance
\( G_n = \frac{F_{1,n} + F_{2,n}}{2}\)
\( \text{cov}( f (X) , g (Y) ) \lt 0 \)
\( X \sim Y \)
Se Mathematical stats
Collection of scalar RVs indexed by time On a probability space \( (\Omega, \mathcal{F}, \text{Pr}) \)
\(X : D \times \Omega \to R\)
\( ( X_{t} )_{t \in D} = X(t,\omega)\)
\(F_{X_{t_1} , \cdots , X_{t_n}} (\textbf{x}) = \text{Pr}( \bigwedge^{n}_{i=1} X_{t_i} \leq x_i )\)
\( \lim_{x_1 \to \infty} F_{X_{t_1} , \cdots , X_{t_n}} (x_1 , \cdots , x_n) = F_{X_{t_2} , \cdots , X_{t_n}} (x_2 , \cdots , x_n) )\)
\( \lim_{x_n \to \infty} F_{X_{t_1} , \cdots , X_{t_n}} (x_1 , \cdots , x_n) = F_{X_{t_1} , \cdots , X_{t_{n-1}}} (x_1 , \cdots , x_{n-1}) )\)
\( (X_t) \text{ is a Gaussian process } \iff \textbf{x} \text{ is joint Gaussian}\)
\( (X_t) \text{ is a Gaussian process } \iff \textbf{x} \sim \text{N}( \textbf{m} , \textbf{Q} ) \)
Also called a Brownian motion from a physical perspective
\( (W_t), t \in [0,\infty] \text{ is a Wiener process } \iff (W_t) \text{ is a Gaussian process }\land \)
\( (W_t) \text{ is a standard Brownian Motion} \iff (W_t) \text{ is a Gaussian process }\land \)
\( X : [0,T] \times \Omega \to \mathbb{R} \text{ is an SP} \land \exists \alpha,\varepsilon \gt 0 [ \forall u,t \in [0,T] ( \text{E}(|X_t - X_u|^{\alpha}) \leq c(t-u)^{1+\varepsilon} ] \implies \text{ is Hölder continuous of order } h \lt \frac{\varepsilon}{\alpha}\)
\( (B_t) \text{ is a standard Wiener process} \implies (B_t) \text{ is Hölder continuous of order } h \lt \frac{1}{4} \)
\(\forall t_0 [ (B_t) \text{ is not differentiable at }t_0 ] \)
Variant of brownian motion such that the Hurst exponent can introduce correlation between the indexed scalar RVs
\( (B^{H}_t), t \in [0,\infty] \text{ is a fractional Brownian motion with Hurst exponent } H \iff (B^{H}_t) \text{ is a Gaussian process }\land \)
Variant of brownian motion such that it ititiates and ends with a value 0 (hence why it is a 'bridge')
\( (B^{b}_t), t \in [0,\infty] \text{ is a standard Brownian Bridge} \iff (B^{b}_t) \text{ is a Gaussian process }\land \)
\( (X_t) \text{ is strictly stationary } \iff \forall n \in \mathbb{N} \)
\(\forall t_i \in D [ F_{X_{t_1} , \cdots , X_{t_n}} (x_1 , \cdots , x_n) = F_{X_{t_1 + h} , \cdots , X_{t_n + h}} (x_1 , \cdots , x_n) ] \)
\( (X_t) \text{ is weakly stationary } \iff \text{E}(X_t)=c \land \text{cov}(X_t , X_{t+h}) = q(h)\)
\( (X_t) \text{ is a Markov process } \iff \forall t_i [ \text{Pr}(X_{t_n +s} \leq y | \bigwedge^{n-1}_{i=1} X_{t_i} = x_i ) = \text{Pr}( X_{t_n+s} \leq y | X_{t_{n-1}} = x_{n-1} ) ] \)
\( \text{E}(e^{i u X_{t+s}} | X_{t_1}, ... , X_{t_n} ) = \text{E}(e^{i u X_{t+s}} | X_{t_n} ) \)
Homogeneous transition probabilites occur when \(\text{Pr}(X_{t+s} | X_t = x)\) is independetn of \(t\)
\( X_{t} = g_{t-1}(X_{n-1}, Y_{t}) \)
\( (X_t) \text{ is a Markov chain} \iff (X_t) \text{ is a Markov process } \land \text{im}(X_t) \text{ is countable}\)
Stationary Markov chain, Markov chain such that conditional probability of \(X_{t+h} | X_{t}\) is totally dependent on \(h\)
\( (X_t) \text{ is a Gaussian process } \implies (X_t) \text{ is a Markov process } \iff \text{E}(X_{t_3} | X_{t_1} , X_{t_2}) = \text{E}(X_{t_3} | X_{t_2}) \lor \text{cov}(X_{t_1},X_{t_2}) \text{cov}(X_{t_2},X_{t_3}) = \text{cov}(X_{t_1},X_{t_3}) \text{cov}(X_{t_2},X_{t_2}) \)
\( \text{cov}(X_{t}, X_{t+h}) = \text{Var}(X_t) e^{-\alpha |h|}\)
\( (X_t) \text{ is a continuous Markov process } \implies \)
\(f(x_1 ,t_1 ; \cdots ; x_n , t_n) = f(x_1,t_1) \prod^{n}_{i=2} f(x_i , t_i | x_{i-1} , t_{i-1}) \)
\( f(x_3 , t_3 | x_1 , t_1 ) = \int^{\infty}_{-\infty} f(x_2 , t_2 | x_1 , t_1) f(x_3 , t_3 | x_2 , t_2) dx_2\)
CPDF of a Markov process \( (X_t) \)
\(f_{X}(y,s|x,t)\)
Scalar RV \(T_i\) representing the time taken for a continuous-time Markov chain to change its state starting from time \(t\), where \(X_t =x_i\) is realized
\(T_i = \min \{ s \gt 0 : X_{t+s} \neq x_i \} \)
\((X_{t}) \text{ is a homogeneous continuous-time Markov chain } \implies T_i \sim \text{exp}(\nu_i) \)
\(\nu_i \geq 0 \text{ is the intensity of } T_i \)
Matrix denoting probability of which state a continuous-time Markov chain jumps to next, regardless of waiting time
\(\textbf{P}^{\text{jump}}\)
\( p^{\text{jump}}_{ij} = \text{Pr}(X_{t+T_i} = x_j)\)
\( \sum_{j} p^{\text{jump}}_{ij} =1\)
\( x_i \text{ is an absorbing state } \implies p^{\text{jump}}_{ij} = \delta_{ij} \)
\( x_i \text{ is not an absorbing state } \implies p^{\text{jump}}_{ii} = 0 \)
Matrix denoting probability of which state a continuous-time Markov chain jumps to next, regardless of waiting time
\(\textbf{A}\)
\( x_i \text{ is not absorbing } \implies a_{ij} = \begin{cases} \nu_i p^{\text{jump}}_{ij} i\neq j \\ -\nu_i & i=j \end{cases} \)
\( x_i \text{ is absorbing } \implies \nu_i = 0 \)
\((X_t) \text{ is a Birth-Death process } \iff\)
\(D_i \sim \text{exp}(\mu_i ) \) is the departure waiting time
\(A_i \sim \text{exp}(\lambda_i )\) is the arrival waiting time
\(T_i = \text{min}(A_i , D_i) \) is the waiting time for an event
\(D_i \lt A_i \land X_t \gt 0 \implies X_{t+D_i} = i-1\)
\(A_i \leq D_i \implies X_{t+A_i} = i+1\)
\(T_i \sim\text{exp}(\nu_i), \nu_i =\begin{cases} \lambda_0 & i=0 \\ \lambda_i + \mu_i & i\neq 0 \end{cases}\)
\(p^{\text{jump}}_{i,i-1}= \frac{\mu_i}{\lambda_i + \mu_i}\)
\(p^{\text{jump}}_{i,i+1}= \frac{\lambda_i}{\lambda_i + \mu_i}\)
PDE for transition densities for jump processes
\( \frac{d}{ds}\textbf{P}(s) = \textbf{A}\textbf{P}(s)\)
\( \frac{d}{ds}\textbf{P}(s) = \textbf{P}(s) \textbf{A} \)
\( \textbf{P}(s) = \textbf{I} +\sum^{\infty}_{n=1}\frac{s^n \textbf{A}^n}{n!} \)
\( \textbf{P}(s) = e^{s\mathbf{A}} \)
Processes such that \(X_{u} -X_0 , X_{t}-X_{s}\) are independent for any \(u \leq s \leq t\)
Processes with independent increments are Markov processes
Processes such that \(X_{t} -X_s\) are independent when \( (t,s)\) is disjoint and \(X_t - X_s\) is dependent only on \(t-s\)
Stochastic process \( (N_t) \) that count 'events' that have occured up to time \(t\)
Counting process with independent increments modelled by a Poisson distribution
\( (N_t) \text{ is a Poisson process with intensity } \lambda \iff \)
\( (N_t) \text{ is a Poisson process with intensity } \lambda \iff \)
\( (N_t) \text{ is a Poisson process with intensity } \lambda \iff \)
\( (N_t) \text{ is a Poisson process with intensity } \lambda_N \land (M_t) \text{ is a Poisson process with intensity } \lambda_M \iff \)
Stochastic process
\(X_t = \sum^{N_t}_{k=1}Y_k\)
Right-continuous, left-limit function
Continuous time Markov process with continuous trajectories such that there exists functions \(a,b\) such that
\(\text{E} (|X_{t}|^{2+\delta}) \lt \infty\)
\(X_{s+h} -X_{s} = a(s,X_s)h + b(s,X_s)(W_{s+h}-W_{s}) + \text{o}(h)\)
Numerically approximating an SDE by a Markov chain
\(dX_t = a(X_t , t)dt +b(X_t,t)dW_t\)
PDE for transition densities for diffusion processes
\( \frac{\partial f(y,t|x,s)}{\partial s} +a(x,s) \frac{\partial f(y,t|x,s)}{\partial x} +\frac{1}{2}b^2 (x,s) \frac{\partial^2 f(y,t|x,s)}{\partial x^2} =0\)
\( \frac{\partial f(y,t|x,s)}{\partial s} + \frac{\partial a(y,t)f(y,t|x,s)}{\partial x} +\frac{1}{2} \frac{\partial^2 b^2 (y,t)f(y,t|x,s)}{\partial x^2} =0\)
\(f_{Y|X}(y,s|x,s) =\delta(y-x)\)
Integral defined for functions adapted to a filtered probability space with respect to a standard Wiener process
\( \mathcal{P} = \{s_0 , s_1 , \ldots , s_n \} \)
\[ \int^{t}_{0} f_s dB_s = \lim_{n\to \infty} \sum^{n}_{i=1}f_{s_i} (B_{s_i} -B_{s_{i-1}}) \]
Adapted process
\( (X_t) \text{ is an Itō process} \iff X_t = X_0 +\int^{t}_{0} \mu_s ds + \int^{t}_{0} \sigma_s dB_s + \)
\( dX_t = \mu_t dt + \sigma_t dB_t \land f \implies df(t,X_t) = (f_t + \mu_t f_x + \frac{\sigma^{2}_{t}}{2}f_{xx})dt + \sigma_t f_x dB_t \)
Weakly stationary SP that that obey the 'AR formula'
\( (X_t)_{t \in \mathbb{Z}} \text{ is an } \mathrm{AR}(p) \text{ process } \iff \)
\( (X_{t}) \text{is an } \mathrm{AR}(1) \text{ process } \)
SP that that obey the 'MA formula'; any SP drawn from this formula will be weakly stationary
\( (X_t)_{t \in \mathbb{Z}} \text{ is an } \mathrm{MA}(p) \text{ process } \iff \)
\( (X_{t}) \text{ is weakly stationary }\)
Weakly stationary SP that that obey the 'ARMA formula'
\( (X_t)_{t \in \mathbb{Z}} \text{ is an } \mathrm{ARMA}(p,q) \text{ process } \iff \)
\(c=0 \implies\)
if \(\phi(z)\) has no roots on the complex unit circle, \(\mathrm{ARMA}(p,q)\) has unique stationary solution
RV operator that takes the previous timestamp of that RV in the SP
\(B\)
\(\phi(B)X = \theta(B)Z\)
\(X=\psi(B)Z \)
Coefficients of \( \psi \), which are calculated by the following recurisive formula
\( \psi_0 = 1 \)
\( \psi_j = \sum^{j}_{k=1} \phi_k \psi_{j-k} + \theta_j\)
\( \gamma_{X}(h) - \sum_{j \in \mathbb{Z}} \sum_{k \in \mathbb{Z}} \psi_j \psi_k \gamma_Z (h+j-k)\)
ARMA process with solution \(X_t = \sum^{\infty}_{j=0} \psi_j Z_{t-j}\)
\(X_{t} \text{ is an ARMA process } \implies X_{t} \text{ is causal } \iff \forall z [ |z| \leq 1 \implies \phi(z) \neq 0 ]\)
\(\psi(z) = \frac{\theta (z)}{\phi(z)}\)
ARMA process such that \(Z_t = \sum^{\infty}_{j=0} \pi_j X_{t-j}\)
\(X_{t} \text{ is an ARMA process } \implies X_{t} \text{ is invertible } \iff \forall z [ |z| \leq 1 \implies \theta(z) \neq 0 ]\)