33230 - Mathematics 2


Linear Algebra

Eigenequation Autoequazione 固有機能

Matrix transformation equation such that:

\(\textbf{M}\textbf{v}=\lambda \textbf{v}\)

Geometrically, this represents inputs to transformation \(\textbf{M}\) such that the output is the input \( \textbf{0} \) times a scalar; the direction of the output vector is the same as the input vector

Characteristic polynomial

Polynomial such that the roots are the eigenvalues of a transform

\( \text{det} (\textbf{M}-\lambda \textbf{I})=0 \iff \lambda \text{ is an eigenvalue of } \textbf{M}\)

Proof

\(\textbf{M} \textbf{v} =\lambda \textbf{v}\)

\(\implies \textbf{M} \textbf{v} - \lambda \textbf{v} = \textbf{0}\)

\(\implies (\textbf{M} - \lambda) \textbf{v} = \textbf{0}\)

\(\implies (\textbf{M} - \lambda \textbf{I}) \textbf{v} = \textbf{0}\)

\( \textbf{v} \neq \textbf{0} \implies \text{det} (\textbf{M}-\lambda \textbf{I})=0\)

Diagonal matrix eigenvalues

\(\textbf{M} \text{ is diagonal} \implies \lambda = m_{jj}\)

Square matrix eigenvalues

\(\sum^{n}_{i=1} \lambda_i = \text{tr}(\textbf{M}) \)

\(\prod^{n}_{i=1} \lambda_i = \text{det}(\textbf{M}) \)

Linear Transformation Trasformazione lineare 線形変換

Vector functions \(T : V \to W \) that take a vector from \(V\) (domain) as input and return another vector from \(W\) (range) that represents scaling, rotating, and/or reflecting

\( T \text{ is a linear transformation } \iff T(k\textbf{x}_{1}+p\textbf{x}_{2})=kT(\textbf{x}_{1})+pT(\textbf{x}_{2})\)

Scaling transformation

Transform that 'stretches' components proportionally, formally following this definition:

\( T \text{ is a pure scaling transformation} \iff \exists k : T(\textbf{v})=k\textbf{v}\)

Rotation transformation

Flipping a vector by a specific angle or axis while maintianing the vector magnitude

For instance multiplying a vector by the matrix \(\begin{bmatrix} \cos ( \theta) & -\sin ( \theta) \\ \sin ( \theta) & \cos ( \theta ) \end{bmatrix}\) rotates the vector by \(\theta \). This is seen as to rotate \(\hat{i}\) by \(\theta\), \( \begin{bmatrix} \cos ( \theta) \\ \sin ( \theta) \end{bmatrix}\) rotates this and to rotate \(\hat{j}\) one needs \( \begin{bmatrix} -\sin ( \theta) \\ \cos ( \theta) \end{bmatrix}\)

Proof

Let \(\textbf{v} = \begin{pmatrix} r \cos (k) \\ r\sin (k) \end{pmatrix}\), now to rotate by \(\theta\) implies \( T( \textbf{v}) = \begin{pmatrix} r \cos (k + \theta) \\ r \sin (k + \theta) \end{pmatrix} = \begin{pmatrix} r ( \cos (k) \cos ( \theta ) - \sin(k) \sin ( \theta )) \\ r( \sin (k) \cos ( \theta) + \sin ( \theta ) \cos ( k )) \end{pmatrix} = r \cos (k) \begin{pmatrix} \cos(\theta) \\ \sin(\theta) \end{pmatrix} + r \sin (k) \begin{pmatrix} - \sin (\theta) \\ \cos(\theta) \end{pmatrix}= \begin{bmatrix} \cos ( \theta ) & - \sin ( \theta ) \\ \sin( \theta ) & \cos ( \theta ) \end{bmatrix} \textbf{v}\)

Reflection transformation

Orthogonal matrix

\(\textbf{A} : \textbf{A}^{-1}=\textbf{A}^{T}\)

Invertible transformation

A function \(T^{-1} : \forall \textbf{x}, T^{-1}(T(\textbf{x}))= \textbf{x} \)

Multivariable functions and differentiation

Multivariable input

As a matter of convenient notation, the input of a function can be represented as a vector \(\textbf{x}\) (as inputs are essentially vectors to the point of interest anyways)

Multivariable function Funzione moltivariabile 多変数機能

Function \(f: X^n \to Y\) with \(n\) inputs and one output.

For instance, double-variable functions are defined as \(f: X \times Y \to Z\) with two inputs \(x,y\) and one output \(f(x,y)=z\)

Slicing Affettare 切り

A method of plotting 3D graphs using 2D graphs, done by the following:

You can use \(f(x,y)=k\) to make contour plots and visualise the graph in 3D

Contour map

2D map where curves represent 3D depth.

Level set

Sets representing all solutions of a function when \(z=c\)

\( { \textbf{x} \in \mathbb{R}^n : f(\textbf{x})=c } \)

Two dimensional limit Limite di due dimenzioni 二次元制限

\(\lim_{(x,y) \to (a,b)} f(x,y)=L\)

Let us define two generic functions \( g(x) : \lim_{x \to a} g(x) = a,h(y) : \lim_{y \to b} h(y) = b\)

\(L=\lim_{(x,y) \to (a,b)} f(x,y) \iff \forall g,h, L=\lim_{(x,y) \to (a,b)} f(g(x),h(y))\)

This states that a limit is only defined if approaching the value at all directions yields the same result

Multivariable continuity

\(f \text{ is continuous } \iff \lim_{(x,y) \to (a,b)}f(x,y)=f(a,b)\)

Division by zero or function definition changes that induce 'jumps' make a function discontinuous, however this is the rigorous definition that mathematically defines this intuitive notion. See real analysis for more information.

Vector function

Function \(f: X^n \to V\) with a n variables as inut and a vector \(\textbf{x} : m \times 1\) as output. The range of a vector function is known as a vector field

Partial derivative Derivato parziale 偏微分

For multivariable functions, derivatives are taken with respect to individual variables. For instance, \(f(x,y)\) has the following partial derivatives for its respective variables:

Partial derivatives are functions describing the gradient of a multivariable function 'on one axis'/, or more formally speaking, the rate of change of only one variable.

Definition

Bivariate case definition

Higher-order partial derivatives

Taking higher order derivatives of a function are still taken with respect to a chosen variable, for instance, \(f(x,y)\) has a total of 4 second partial derivatives:

Note how the partial derivatives \(f_{xy},f_{yx}\) describe the rate of change along the axis \(y=x\), \(f_{xx},f_{yy}\) describe the how fast the rate of change is in the \(x\) and \(y\) axises respectively.

Clairaut's Theorem Teorema di Schwarz ヤングの定理

Theorem asserting the symmetry of second order derivatives

\(f_{xy},f_{yx} \text{ are continous at } (x,y) \implies f_{xy} = f_{yx}\)

Magnification Magnificazione 拡大

Functions can be modified by a magnifying sequence of functions which translates the point of interest (place one wants to magnify) to the origin, dilates the function out in all dimensions, and then translates the function back to its original point

\( f_m (x,y) = m (f( \frac{x-a}{m} + a, \frac{x-b}{m} + b ) - f(a,b)) + f(a,b) \)

\( \lim_{m \to \infty} f_m (x,y) = f_x(a,b)(x-a) + f_y(a,b)(y-b) + f(a,b) \)

One can chech whether a critical point is a maxima, minima or saddle by using a magnification that makes the output variable quadric, and hence the infinite limit of the following is used:

\( f_m (x,y) = m^2 (f( \frac{x-a}{m} + a, \frac{x-b}{m} + b ) - f(a,b)) + f(a,b) \)

Tangent plane

A linear approximation of a multivariable function at a specific point. for instance at point \(\textbf{x}\):

\(z-f(x_{0},y_{0})=f_{x}(x_{0},y_{0})(x-x_{0})+f_{y}(x_{0},y_{0})(y-y_{0})\)

Normal vector

For a multivariable function \(f\), the vector normal to the surface can be derived by rearranging the tangent plane formula:

\(\textbf{n} = \begin{pmatrix} -f_{x_1}(a,b) \\ -f_{x_2}(a,b) \\ \vdots \\ 1 \end{pmatrix}\)

An alternate expression is

\(\textbf{n} = \nabla F : F(\textbf{x},y) = y-f(\textbf{x})\)

Note how when \(y=f(\textbf{x})\) we have the level set \(F=0\), so \(\nabla F(\textbf{x},f(\textbf{x})\) is normal to the level set \(F=0\)

Linear approximation Approssimazione lineare

To approximate a value of a function for sufficiently small \(\boldsymbol{\Delta}\textbf{x}\)

\(f(\textbf{x}+ \boldsymbol{\Delta}\textbf{x}) \approx f(\textbf{x}) + \sum_{j=1}^{n} f_{x_j}(\textbf{x})\Delta x_{j} \)

\( \lim_{m \to \infty} f_m (x,y) = f_x(a,b)(x-a) + f_y(a,b)(y-b) + f(a,b) \)

Chain rule Regola della catena 連鎖律

\(\frac{\partial f}{\partial t}=\frac{\partial f}{\partial x}\frac{\partial x}{\partial t}+\frac{\partial f}{\partial y}\frac{\partial y}{\partial t}\)

\(\frac{\partial f}{\partial t}= \sum_{i=1}^{n} \frac{\partial f}{\partial x_i}\frac{\partial x_i}{\partial t}\)

Differentials Differenziali

Since the unrigorous differential is immediately derived from the its rigorous counterpart (the chain rule), differentials for multivariate functions are calculated by the total differential

\(\Delta f \approx \sum_{i=1}^{n} \frac{\partial f}{\partial x_i} \Delta x_i\)

\( df = \sum_{i=1}^{n} \frac{\partial f}{\partial x_i} d x_i\)

Gradient vector Vettore del gradiente

Vector pointing towards the greatest increase of a function at a designated point

\(\displaystyle \nabla f(x,y) = \begin{pmatrix} \frac{\partial f }{\partial x} \\ \frac{\partial f}{\partial y} \end{pmatrix}\)

\(\nabla f(x,y) = f_x \hat{i} + f_y \hat{j}\)

One can use the multivariable chain rule to prove the following directional derivative for cylindrical coordinates

\(\nabla f(r,\theta) = f_r \hat{r} + \frac{f_{\theta}}{r} \hat{\theta}\)

Directional Derivative Derivato direzionale 指向性微分

To find the derivative at a point \(\textbf{x}\) in the direction of a given unit vector \( \hat{u}\), one uses the directional derivative which employs unit vectors in the newton quotient to find the rate of change in any desired direction. The result is a scalar representing the gradient in the direction of \(\hat{u}\)

\(D_{\hat{u}}f(x,y)=\lim_{t \to 0^{+}}\frac{f(\textbf{x}+t\hat{u})-f(\textbf{x})}{t}\)

One can prove different forms of this definition using geometry and calculus:

\(D_{\hat{u}}f(x,y)=\frac{\partial f}{\partial x}\hat{u}_x + \frac{\partial f}{\partial y}\hat{u}_y\)

\(D_{\hat{u}}f(\textbf{x})= \sum^{n}_{j=1}\hat{u}_j f_{x_j} \)

\(D_{\hat{u}}f(x,y)=\nabla f \cdot \hat{u}\)

\(D_{\hat{u}}f(x,y)=|\nabla f| \cos (\theta)\)

Maximum value Valore massimo

Largest attainable directional derivative which occurs when \(\textbf{u}\) has the same direction to \(\nabla f \), hence when \(D_{\textbf{u}}f = |\nabla f(x)|\)

Minimum value Valore minimo

Smallest attainable directional derivative which occurs when \(\textbf{u}\) has the opposite direction to \(\nabla f \), hence when \(D_{\textbf{u}}f = -|\nabla f(x)|\)

Quadratic approximation Approssimazione quadratico

To determine the concavity at a critical point, one can use the following:

\( \lim_{m \to \infty} f_m (x,y) = \frac{1}{2} \begin{bmatrix} x-a & y-b\end{bmatrix} \begin{bmatrix} f_{xx}(a,b) & f_{xy}(a,b) \\ f_{yx}(a,b) & f_{yy}(a,b) \end{bmatrix} \begin{bmatrix} x-a \\ y-b \end{bmatrix} + f(a,b) \)

This result can be found by approximating a 2-variable function by a 2nd order Taylor series and arranging into matrix form (this shows some reasoning behind the Hessian matrix).

Hessian Matrix

Matrix relating to the second derivatives of a multivariate function

\(H =\begin{bmatrix} \frac{\partial^2 f}{\partial x_{1}^{2}} & \frac{\partial^2 f}{\partial x_1 x_2} & \cdots & \frac{\partial^2 f}{\partial x_1 x_n} \\ \frac{\partial^2 f}{\partial x_2 x_1} & \frac{\partial^2 f}{\partial x_{2}^{2}} & \cdots & \frac{\partial^2 f}{\partial x_2 x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x_n x_1} & \frac{\partial^2 f}{\partial x_{n} x_2} & \cdots & \frac{\partial^2 f}{\partial x_{n}^2 } \end{bmatrix}\)

Optimization

Under quadratic approximation, The function around the critical point translated to the origin has the form \( \frac{1}{2} \begin{bmatrix} x & y \end{bmatrix} H \begin{bmatrix} x \\ y \end{bmatrix} = \frac{1}{2} \begin{bmatrix} x' & y' \end{bmatrix}R( \theta) H R( -\theta) \begin{bmatrix} x' \\ y' \end{bmatrix} \) (where \(H\) is the Hessian matrix) and one sees that despite the rotated coordinates \( \text{tr} (H)\) and \( \text{det} (H)\) do not depend on \(\theta\) and are hence the same under rotation; this resistance of the determinant under rotation is why the Hessian matrix's determinant reveals the characteristics of critical points. For instance

Constraint Vincolo 制約

A restricted domain of a function, for instance \(f(x,y)=\sqrt{xy}\) has a natural domain \(D= \{ (x,y) : xy \geq 0 \}\) but a constraint domain could be applied, making the new domain \(D = \{ (x,y) : xy \geq 0 \land x\geq y \}\)

Finding critical points of a function under restricted domains can sometimes be solved algebraically by composition of functions, however when unfeasible to do so, Lagrange multipliers are useful.

Lagrange multiplier

Critical points of \(f\) on constraint function \(g=0\) occurs when \(\nabla f\) exhibits no change in the direction of \(g\) (hence \(\nabla f\) is perpendicular to \(g\), or equivalently \(\nabla f\) is parallel to \(\nabla g\))

\(\nabla f =\lambda \nabla g\)

Multivariable integration

Multivariate infinitesimal elements

Multivariable Riemann integral

\(\int_{c}^{d}\int_{a}^{b}f(x,y)dxdy=\text{lim}_{m,n \to \infty}\sum^{m}_{i=1}\sum^{n}_{j=1}f(x_{i},y_{j})\Delta x \Delta y\)

\(\Delta x=\frac{b-a}{n}\)

\(\Delta y=\frac{d-c}{m}\)

\(x_{i}=a+i\Delta x\)

\(y_{j}=c+j\Delta y\)

Domain

Single variable calculus may have an integral \( \int_{D} \)

Domains in \(n\) variable calculus can become quite complex rather than of the standard form \([a,b] : a,b \in \mathbb{R} \) but rather can be expressed as an \(n\) variable inequality such as \(D = \{ (x,y) : x^2 + y^2 \leq 9\}\) or as the cartesian product of two real subsets \(D = [a,b] \times [c,d]\). To ease notation for multivariable variable calculus, one can write \( \iint_{D} f(x)dxdy \)

Nesting

The act of integrating in one dimension while keeping the other variable constant and integrating the result for the next dimension

\(\int_{c}^{d}\int_{a}^{b}f(x)dxdy=\int_{c}^{d}[\int_{a}^{b}f(x)dx]dy\)

Fubini's theorem

\(\int_{b}^{a}\int_{g_{u}(x)}^{g_{l}(x)}f(x,y)dydx=\int_{d}^{c}\int_{h_{u}(y)}^{h_{i}(y)}f(x,y)dxdy\)

\(\int_{X} (\int_{Y}f(x,y) dy) dx =\int_{Y} (\int_{X} f(x,y) dx) dy \iff \iint_{X \times Y} |f(x,y)|d(x,y) \lt \infty\)

Separation of variables

\(\int^{d}_{c}\int^{b}_{a}f(x)g(y)dxdy = \int^{d}_{c}g(y)dy \int^{b}_{a}f(x)dx\)

Applications

Area

\(\displaystyle A =\iint_{D} dA\)

Volume

\(\displaystyle V =\iiint_{D} dV\)

Average

\(\displaystyle \overline{f} = \frac{1}{A} \iint_{D} f(x,y) dA\)

\(\displaystyle \overline{f} = \frac{1}{V} \iiint_{D} f(x,y,z) dV\)

Average x-position

Centroid

\(\displaystyle \begin{pmatrix} \overline{x} \\ \overline{y} \end{pmatrix} = ( \frac{1}{A} \iint_{D} x dA ) \hat{i} + ( \frac{1}{A} \iint_{D} y dA ) \hat{j} \)

\(\displaystyle \begin{pmatrix} \overline{x} \\ \overline{y} \\ \overline{z} \end{pmatrix} = ( \frac{1}{V} \iiint_{D} x dV ) \hat{i} + ( \frac{1}{V} \iiint_{D} y dV ) \hat{j} + ( \frac{1}{V} \iiint_{D} z dV ) \hat{k} \)

Coordinate systems

Cartesian coordinates

Ordered pair \( (x,y) \) of a horizontal value \(x\) and vertical value \(y\) to represent a point in a 2D space

Ordered 3-tuple \( (x,y,z) \) of a depth value \(x\) horizontal value \(y\) and a vertical value \(z\) to represent a point in a 3D space

Cylindrical coordinates

Ordered 3-tuple \( (r,\theta,z) \) of an angle from x-axis \(\theta\) a distance from the z-axis (modulus) \(r\) and a vertical value \(z\) to represent a point in a 3D space

Spherical coordinates

Ordered 3-tuple \( (\rho,\theta,\phi) \) of a horizontal angle (azimuth) \(\phi\), vertical angle (altitude) \(\theta\) and a distance from the origin (modulus) \(r\) to represent a point in a 3D space

Polar coordinates

Ordered pair \((r,\theta)\) of a modulus \(r\) and a verticle angle (inclination) \(\theta\) to represent a point in a 2D space

\(\displaystyle \iint_D f(x,y)dydx = \int_{\theta_{1}}^{\theta_{2}}\int f(r\cos (\theta),r\sin (\theta))rdrd\theta\)

Coordinate system

System of variables used to define a set of points in a space

Point representation

Each coordinate system has a set of equations to translate points in another system to said coordinate system

Vector representation

Each coordinate system has an orthonormal basis relative to a point in space that represents any vector from that point, e see Linear Algebra

Note that some orthonormal basises may be dependent on some \(\theta\) or \(\phi\) relational to the vector's base from the origin

\(\mathbb{R}^2\)

\(\mathbb{R}^3\)

Cartesian coordinates

Ordered 3-tuple \( (x,y,z) \) that represents a point in a 3D space

Volume element

\( dV = dxdydz \)

Polar coordinates

Ordered pair \( (r,\theta) \) that represents any point and vector in a 2D space

Point transfom

Area element

\( dA = rdrd\theta \)

Cylindrical coordinates

Ordered 3-tuple \( (r,\theta,z) \) that represents any point and vector in a 3D space

Point transform

Volume element

\( dV = rdrd\theta dz \)

Spherical coordinates

Ordered 3-tuple \( (\rho,\theta,\phi) \) that represents any point and vector in a 3D space

Point transform

Volume element

\( dV = \rho^2 \sin ( \theta ) d\rho d\theta d\phi \)

Jacobian matrix

Matrix relating to the derivatives of a vector valued function

\( J = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_2}{\partial x_1} & \dots & \frac{\partial f_n}{\partial x_1} \\ \frac{\partial f_1}{\partial x_2} & \frac{\partial f_2}{\partial x_2} & \dots & \frac{\partial f_n}{\partial x_2} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial f_1}{\partial x_n} & \frac{\partial f_2}{\partial x_n} & \dots & \frac{\partial f_n}{\partial x_n} \end{bmatrix}\)

General area element transform

For substitution of multiple variables, the integral differential must adequately reflect the infinitesimal change. This can be directly resolved geometrically, or formally through the determinant of the Jacobian matrix (determinants have a geometric of quantifying the space bound by the origin, matrix's columns, and the sum of each pair of the matrix's columns

\(dxdy= \begin{vmatrix} \frac{\partial x}{\partial s} & \frac{\partial x}{\partial t} \\ \frac{\partial y}{\partial s} & \frac{\partial y}{\partial t} \end{vmatrix} dsdt\)

\(\prod_{i=1}^n dx_i = \text{det}(J) \prod_{i=1}^n ds_i \)

Fundamental Statistics

Statistics Statistiche 統計

The science of collecting and analysing data, with two primary models:

Variation Variazione 相違

Variation from an expected model in a modelling equation, represented with \(\varepsilon\)

Data データ

Information of a variable from a sample, can be sumamrised numerically and graphically

Population Popolazione 人口

Set of all items of interest of an experiment

Sample Campione サンプル

Subset of the population

Parameter Parametro パラメータ

Quantity numerically evaluated from population, often represented through greek letters

Statistic Statistica 統計

Quantity numerically evaluated from a sample, interpreted as random outcomes; often represented through latin letters

Estimator

Statistic that estimates a parameter

Symmetry Simmetro 対称

Measures of central tendency Misure di tendenza centrale 中枢傾向の測定

Measures of what results are the most common, average, or otherwise just 'central' values of a variable

Mean

Parameter representing expected value of a population

\(\mu = \frac{\sum_{i=1}^{N} X_{i}}{N}\)

Sample mean Media 平均

Statistic that estimates the mean

\(\bar{X} = \frac{\sum_{i=1}^{n} x_{i}}{n}\)

Median Mediana 中央値

The center value in a set of observations in ascending order, less prone to skewiness than the mean

the \(\frac{(n+1)}{2}^{\text{th}}\) value in said ordered set when the amount of elements is an odd number.

When there is an even number of elements, it is the average of the \(\frac{n}{2}^{\text{th}}\) and the \(\frac{(n)}{2}+1^{\text{th}}\) values.

Mode Moda 最頻値

Value(s) with highest frequency in a set, unaffected by extreme values

Data Analysis

Range Gamma 範囲

Difference between the highest value and lowest in a dataset, it ignores data distribution

\(\text{Range} = \max (X_n)-\min (X_n)\)

Variance Varianza 分散

Parameter representing the mean square distance of all population elements from the mean

\( \sigma^2 = \frac{\sum_{i=1}^{n} (X_{i}-\mu)^{2}}{n} \)

sample variance

Statistic for the variance

\( s^2 = \frac{\sum_{i=1}^{n} (X_{i}-\bar{X})^{2}}{n-1} \)

Standard deviation Deviazione standard 標準偏差

Square root of variance, the mean Pythagorean distance of all population elements from the mean

Covariance

Parameter representing the variation between two RVs

\( \text{cov}(X,Y) = \frac{\sum_{i=1}^{n} (X_{i}-\mu_{X})(Y_{i} - \mu_{Y})}{n} \)

Sample covariance

Statistic estimating the covariance

\( q_{X,Y} = \frac{\sum_{i=1}^{n} (X_{i}-\bar{X})(Y_{i} - \bar{Y})}{n-1} \)

Quartile Quartile 四分位

Three values (\(Q_1\), \(Q_2\), \(Q_3\); where \(Q_2\) is the median) that partitions the ordered data into 4 quarters. The position of a quartile can be found with the following formula, with \(n\) representing the amount of elements in the dataset. A quartile may not always be in \(\mathbb{Z}\), so round it to the nearest integer

\(Q_{i}=\frac{i(n+1)}{4}\)

Interquartile range (IQR)

Measure of spread that is less sensitive to outliers than the standard deviation, but more sensitive to distribution than the range

\(\text{IQR}=Q_{3}-Q_{1}\)

Data displaying

Types of distribion Tipi di distribuzione 分布種類

Percentile パーセンタイル

The percent of data entries that are less than or equal to a specific entry, for instance, the mean is 50%, \(Q_1\) is 25%

Bivariate relationship Relazione bivariata 二変量関係

When two variables have a relation

Z-Score Standardizzazione 標準得点

Amount of standard deviations a value is from the mean. Such a 'score' allows for ease of comparison between different distributions

Sample

\(Z=\frac{X-\bar{X}}{s}\)

Population

\(Z=\frac{X-\mu }{\sigma }\)

Random variables and distributions

Experiment Esperimento 実験

Process to find outcome when outcome is uncertain

Random variable Variabile casuale 乱数

Random variable concepts

Probability function

This subject requires the following probability functions that can be accessed at Probability and Random Variables

Conditional probability

The probability of one event given that another event is guaranteed to occur. The symbol \(|\) in probability statements represents this.

Distributions

This subject requires the following distributions that can be accessed at Probability and Random Variables

Linear combinations of random variables

When random variables for two independent experiments are available, you can combine these experiments and their statistics into one single experiment. \(a_{i}\) is some weight constant such that \(\sum_{i=1}^{n} a_{i} = 1\) (in many cases, all the \(a_{i}\) are just reciprocals of n)

\(Y=\sum_{i=1}^{n}a_{i}X_{i}\)

\(\text{E}(Y)=\sum_{i=1}^{n}a_{i}\mu_{i}\)

\(\text{Var}(Y)=\sum_{i=1}^{n}a_{i}^2\sigma_{i}^2\)

Kurtosis

Measure of the 'tailedness' of a random variable, that is, the distribution of values as one deviates from the mean

\(\text{Kurt}(X) = \frac{\mu_{4}}{\sigma^4} = \text{E}[ (\frac{X-\text{E}(X)}{\sqrt{\text{Var}(X)}})^4 ]\)

Skewiness

Measure of the level of asymmetry of probability density/mass around a random variable's mean

\(\text{Skew}(X) = \frac{\mu_{3}}{\sigma^3} = \text{E}[ (\frac{X-\text{E}(X)}{\sqrt{\text{Var}(X)}})^3 ]\)

Random sample

A collection of random variables with the same distribution

Central Limit Theorem (CLT)

See mathematical statistics

Theorem that asserts that linear combinations of random variables of the same distribution converge to the normal distribution.

Statistical inference

Statistical inference Inferenza statisticale 統計推論

Using data analysis on samples to infer properties of a distribution of the population.

Hypothesis Ipotesi 仮設

A pair of disjoint statements (only one statement can be true) \( ( H_0, H_1 )\) about some population parameter declaring how inference is made on the parameters of a population by stating an equality and inequality respectively,

Error rates

Type I Error rate

Probability \(\alpha\) of rejecting \(H_{0}\) when it is true

\(\alpha = \text{Pr}(H_{0} \text{ is rejected} | H_{0} )\)

Type II Error rate

Probability \(\beta\) of retaining \(H_{0}\) when it is false

\(\beta = \text{Pr}(H_{0} \text{ is retained} | H_{1} ) \)

Test statistic

Statistic that measure of compatibility between the data and \(H_{0}\). Large test statistics infer weak compatibility with \(H_{0}\), while test statistics close to 0 infer strong compatibility with \(H_{0}\).

Confidence interval

Interval around an estimator (sample mean,variance etc) that contains the population parameter with \( (1-\alpha) \) confidence. Note that this is not a probability; since population parameters are fixed values and not random outcomes, so sone interval either covers the population parameter or it doesn't. Instead this is merely an assertment of confidence.

\(\text{CI} = [X-b , X+b ]\)

P-value

The probability \(p\) of a type I error occuring by inference on some test statistic

\(p = \text{Pr}(H_{0} \text{ is rejected} | H_{0} )\)

\(p = \begin{cases} \text{Pr}(z \gt Z) & H_{1} : \mu \gt \mu_{0} \\ \text{Pr}(z \lt Z) & H_{1} : \mu \lt \mu_{0} \\ 2 \text{Pr}(z \gt |Z|) & H_{1} : \mu \neq \mu_{0} \end{cases}\)

Level of significance

Arbitrarily chosen value (often \(\alpha= 0.05\)) representing the maximum type I error a test statistic may show to 'safely' reject the null hypothesis

Note the difference between \(p\) and the level of significance \(\alpha\); \(\alpha\) is used to represent an arbitrary value representing the maximum acceptable probability of a type I error that the statistician is willing to accept, whereas \(p\) is a calculated value found by indexing a tabulated value chart backwards from the test statistic to its related type 1 error

Degree of freedom

The degree of freedom (often represented as \(\nu\) ) represents the amount of data objects that can be variable.

For instance, if \(\bar{X}=5\), then \(n-1\) data objects (where n represents the amount of data objects) are variable, while the other data object can be deduced by the other values. For T tables, the amount of values is minused by 1 and then located in the T table

Inferential techniques

There are three equivalent methods of evaluating whether to reject \(H_{0}\) once the test statistic has been calculated. one can either:

Rejection region inference

Rejection regions are used to determine when to reject \(H_{0}\) based on different statements of \(H_{1}\). Let \(Z\) be the test statistic and \(\alpha\) be the level of significance (type I error willing to accept):

Note that the use my notation \( A \hookrightarrow B\) means 'A infers B' (the truth of A suggests leniency towards the truth of B, but cannot ensure it), similar to how \(A \implies B\) means ' A implies B' (the truth of A ensures the truth of B)

P-value comparison inference

Equivalently, we reject \(H_{0}\) if the P-value is less than the type I error. Let \(p\) be the P-value and \(\alpha\) be the level of significance (type I error willing to take on):

\( p \lt \alpha \hookrightarrow H_{1} \)

Note that these techniques are completely logically equivalent

Confidence interval inference

\( Z \notin \text{CI} \hookrightarrow H_1\)

Test statistics

Z-test

Test statistic to infer the population mean from a sample mean, given that the population variance is known. CLT ensures that Z-tests can be treated as standard normal RVs if the sample size is large enough

\(H_0 : \mu_0 = \mu \)

\( Z = \frac{\bar{X}-\mu_{0}}{\frac{\sigma}{\sqrt{n}}} \)

Confidence interval

\( \text{CI} = [\bar{X} - z_{\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}}, \bar{X} + z_{\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}} ] \)

T-test (single sample)

Test statistic to infer a population mean from a sample mean, given that the population variance is unknown

\(H_0 : \mu_0 = \mu \)

\( T = \frac{\bar{X}-\mu_{0}}{\frac{s}{\sqrt{n}}} \)

Confidence interval

\( \text{CI} = [\bar{X} - t_{\frac{\alpha}{2},\nu}\frac{s}{\sqrt{n}}, \bar{X} + t_{\frac{\alpha}{2},\nu}\frac{s}{\sqrt{n}} ] \)

Paired Data

Paired data occurs when two populations have a bijection so that each element in set A has some relation to one element in set B

T-test (double sample)

Test statistic to infer whether two sample means for two different populations are significantly different, given unknown population variance

\(H_0 : \mu_1 = \mu_2 \)

\(T = \frac{(\bar{X}_{2}-\bar{X}_{1})-(\mu_{2}-\mu_{1})}{\sqrt{\frac{s_{1}^2}{n_1}+\frac{s_{2}^2}{n_2}}} \)

\(s^{2}_{p}=\frac{(n_{1}-1)s_{1}^2+(n_{2}-1)s_{2}^2}{n_{1}+n_{2}-2}\)

\( \nu = \begin{cases} n_{1}+n_{2}-2 & s_1 = s_2 \\ \frac{(\frac{s^{2}_{1}}{n_1}+\frac{s^{2}_{2}}{n_2})^2}{\frac{(\frac{s^2_1}{n_1})^2}{n_1-1}+\frac{(\frac{s^2_2}{n_2})^2}{n_2-1}} & s_1 \neq s_2 \end{cases}\)

F-test (ANOVA)

Test statistic to infer whether the ratio of two sample variances are significantly different.

\(H_0 : \sigma^2_1 = \sigma^2_2\)

\(F = \frac{\sigma^2_1}{\sigma^2_2} \)

\( \nu_j = n_j-1 \)