37363 - Introduction to Optimization
Maximum
\( c = \max_{u \in U} [f(u)] \iff \forall u \in U [ f(u) \leq c ]\)
- \(\max\) is the maximum notation
- \(f : X \to Y\) is a function
- \(U \subseteq X\) is a subset of \(\text{dom}(f)\)
Argument maximum
\( c = \text{arg max}_{u \in U} [f(u)] \iff \forall u \in U [ f(u) \leq f(c) ]\)
- \(\max\) is the maximum notation
- \(f : X \to Y\) is a function
- \(U \subseteq X\) is a subset of \(\text{dom}(f)\)
Moment generating function
Generating function related to an RV (assuming it exists)
\(M_X (r) = \text{E}(e^{rX}) \)
Characteristic function
Generating function that exists for all RVs
\(\varphi_X (r) = \text{E}(e^{irX}) \)
\( \text{E}(X^{k}) = \frac{1}{i^k} \varphi_{X}^{(k)}(0) \)
Moment generating function
\( M_{\textbf{x}} ( \textbf{u}) = \text{E}( e^{\textbf{u} \cdot \textbf{x}} ) \)
Characteristic function
\( \varphi_{\textbf{x}} ( \textbf{u}) = \text{E}( e^{i\textbf{u} \cdot \textbf{x}} ) \)
Multivariable
- Vector RV
- Joint Probability Mass Function (JPMF)
- Joint Probability Mass Function (JPMF)
- Joint Cumulative Density Function (JCDF)
- Conditional distribution
- Conditional density function
- Marginal distribution
- Marginal Probability Mass Function (MPMF)
- Marginal Probability Density Function (MPDF)
- Marginal Cumulative Distribution Function (MCDF)
Random matrix (Matrix RV)
On a probability space \( (\Omega, \mathcal{F}, \text{Pr}) \),
\( \textbf{X} = \begin{bmatrix} X_{1,1} & X_{1,2} & \cdots & X_{1,n} \\ X_{2,1} & X_{2,2} & \cdots & X_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ X_{m,1} & X_{m,2} & \cdots & X_{m,n} \end{bmatrix}\)
Expectation
Variance
Moment generating function
Characteristic function
Conditional expectation is best mean squared estimator
\(\text{E}(X^2) \lt \infty \implies \min_{g} ( \text{E} ( [ X - g(Y) ]^2)) = \text{E}(X|Y) ) \)
- \(g : \text{codom}(Y) \to \mathbb{R}\) is a nonrandom function such that \( \text{E}(g^2 (Y)) \lt \infty\)
Gaussian Random Vector (Gaussian Vector RV)
\( \textbf{x} \text{ is a Gaussian Vector RV } \iff \textbf{x} = \textbf{m} + \textbf{L}\boldsymbol{\xi}\)
- \(\textbf{L}\) is a nonrandom matrix
- \(\boldsymbol{\xi} : \forall i (\xi_i \sim \text{N}(0,1)) \) is a vector RV of standard normal independent RVs
- \(\textbf{m}\) is a nonrandom vector
\( \textbf{x} \text{ is a Gaussian Vector RV } \iff M_\textbf{x} (\textbf{u}) = \text{E} ( e^{\textbf{m} \cdot \textbf{u} + \frac{1}{2}\textbf{u} \cdot \textbf{Q}\textbf{u} } )\)
\( \textbf{x} \text{ is a Gaussian Vector RV } \iff \varphi_\textbf{x} (\textbf{u}) = \text{E} ( e^{ i\textbf{m} \cdot \textbf{u} - \frac{1}{2}\textbf{u} \cdot \textbf{Q}\textbf{u} } )\)
Properties
- \( \text{E}( \textbf{x} ) = \textbf{m} \)
- \( \text{cov}( \textbf{x} , \textbf{x} ) = \textbf{L}\textbf{L}^{T} \)
- \( \textbf{x} \sim \text{N}(\textbf{m} , \textbf{L}\textbf{L}^{T} ) \)
- \( \textbf{A}\textbf{x} + \textbf{b} \sim \text{N}(\textbf{A}\textbf{m}+\textbf{b} , \textbf{A}\textbf{Q}\textbf{A}^{T} ) \)
Theorem on normal correlation
\(\textbf{x},\textbf{y} \text{ are Gaussian vector RVs} \land \text{cov}(\textbf{x},\textbf{y}) \text{ is positive definite} \implies\)
- \( \text{cov}(\textbf{x},\textbf{y}) = \textbf{0} \implies \textbf{x},\textbf{y} \text{ are independent}\)
- \(\text{E}(\textbf{x} | \textbf{y}) = \textbf{E}(\textbf{x}) + \text{cov}(\textbf{x},\textbf{y})\text{cov}(\textbf{y},\textbf{y})^{-1} [\textbf{y} - \text{E}(\textbf{y})] \)
- \(\text{cov}(\textbf{x}, \textbf{x} | \textbf{y}) = \text{cov}(\textbf{x},\textbf{x}) - \text{cov}(\textbf{x},\textbf{y})\text{cov}(\textbf{y},\textbf{y})^{-1}\text{cov}(\textbf{x},\textbf{y})^{T} \)
Convergence in probabilty
\( \text{plim}_{n \to \infty} X_n = X \iff \forall \varepsilon \in \mathbb{R}_{+} \setminus \{0\}[ \lim_{n \to \infty} \text{Pr}( |X_n -X| \geq \varepsilon ) = 0 ] \)
\( \text{plim}_{n \to \infty} X_n = X \iff \forall \varepsilon \in \mathbb{R}_{+} \setminus \{0\} [ \lim_{n \to \infty} \text{Pr}( |X_n -X| \lt \varepsilon ) = 1 ] \)
Almost sure convergence
\( \lim_{n \to \infty} X_n = X \text{ almost surely }\iff \text{Pr}( \lim_{n \to \infty} X_n = X ) = 1 \)
Convergence in mean square
\( \lim_{n \to \infty} X_n = X \text{ in mean square } \iff \lim_{n \to \infty} \text{E}(|X_n - x|^2) = 0 \)
Crude Monte-Carlo estimator (Crude MC estimator)
\(F_n = \frac{1}{n}\sum^{n}_{i=1} f(U_i)\)
- \( U_i=U \sim \text{U}(0,1)\)
- \( f \)
- \( \text{plim}_{n \to \infty} F_n = \text{E}(f(U)) =F \)
- \( \text{bias}( F_n, F ) =0\)
- \( \lim_{n \to \infty} \frac{F_n -F}{\frac{\text{Var}[f(U)]}{\sqrt{n}}} \sim \text{N}(0,1) \)
Control variate
Introduction of another estimator to minimize variance of crude Monte-Carlo estimator
\( \text{plim}_{n \to \infty }Q_n = \text{E}[q(X)] =Q \)
MC variance reduction
\( G_n = F_n + a(Q_n - Q) \)
- \( F_n \) is the crude MC estimator
- \( Q_n \) is the control variate
- \( a \in \mathbb{R}\)
- \( \text{plim}_{n \to \infty} G_n = \text{E}(f(U)) =F \)
- \( \text{bias}( G_n, F ) =0\)
- \( \lim_{n \to \infty} \frac{F_n -G}{\frac{\text{Var}[f(U) - aq(X) ]}{\sqrt{n}}} \sim \text{N}(0,1) \)
Optimizing \(a\)
By considering \(\frac{d}{da}[\text{Var}(H_n)]=0\)
\( \argmin_{a \in \mathbb{R} } [ \text{Var}(F_n +a(Q_n -Q) ] = \frac{\text{cov}(f(U),q(U))}{\text{Var}(q(U))}\)