Matrix transformation equation such that:
\(\textbf{M}\textbf{v}=\lambda \textbf{v}\)
Geometrically, this represents inputs to transformation \(\textbf{M}\) such that the output is the input \( \textbf{0} \) times a scalar; the direction of the output vector is the same as the input vector
Polynomial such that the roots are the eigenvalues of a transform
\( \text{det} (\textbf{M}-\lambda \textbf{I})=0 \iff \lambda \text{ is an eigenvalue of } \textbf{M}\)
\(\textbf{M} \textbf{v} =\lambda \textbf{v}\)
\(\implies \textbf{M} \textbf{v} - \lambda \textbf{v} = \textbf{0}\)
\(\implies (\textbf{M} - \lambda) \textbf{v} = \textbf{0}\)
\(\implies (\textbf{M} - \lambda \textbf{I}) \textbf{v} = \textbf{0}\)
\( \textbf{v} \neq \textbf{0} \implies \text{det} (\textbf{M}-\lambda \textbf{I})=0\)
\(\textbf{M} \text{ is diagonal} \implies \lambda = m_{jj}\)
\(\sum^{n}_{i=1} \lambda_i = \text{tr}(\textbf{M}) \)
\(\prod^{n}_{i=1} \lambda_i = \text{det}(\textbf{M}) \)
Vector functions \(T : V \to W \) that take a vector from \(V\) (domain) as input and return another vector from \(W\) (range) that represents scaling, rotating, and/or reflecting
\( T \text{ is a linear transformation } \iff T(k\textbf{x}_{1}+p\textbf{x}_{2})=kT(\textbf{x}_{1})+pT(\textbf{x}_{2})\)
Transform that 'stretches' components proportionally, formally following this definition:
\( T \text{ is a pure scaling transformation} \iff \exists k : T(\textbf{v})=k\textbf{v}\)
Flipping a vector by a specific angle or axis while maintianing the vector magnitude
For instance multiplying a vector by the matrix \(\begin{bmatrix} \cos ( \theta) & -\sin ( \theta) \\ \sin ( \theta) & \cos ( \theta ) \end{bmatrix}\) rotates the vector by \(\theta \). This is seen as to rotate \(\hat{i}\) by \(\theta\), \( \begin{bmatrix} \cos ( \theta) \\ \sin ( \theta) \end{bmatrix}\) rotates this and to rotate \(\hat{j}\) one needs \( \begin{bmatrix} -\sin ( \theta) \\ \cos ( \theta) \end{bmatrix}\)
Let \(\textbf{v} = \begin{pmatrix} r \cos (k) \\ r\sin (k) \end{pmatrix}\), now to rotate by \(\theta\) implies \( T( \textbf{v}) = \begin{pmatrix} r \cos (k + \theta) \\ r \sin (k + \theta) \end{pmatrix} = \begin{pmatrix} r ( \cos (k) \cos ( \theta ) - \sin(k) \sin ( \theta )) \\ r( \sin (k) \cos ( \theta) + \sin ( \theta ) \cos ( k )) \end{pmatrix} = r \cos (k) \begin{pmatrix} \cos(\theta) \\ \sin(\theta) \end{pmatrix} + r \sin (k) \begin{pmatrix} - \sin (\theta) \\ \cos(\theta) \end{pmatrix}= \begin{bmatrix} \cos ( \theta ) & - \sin ( \theta ) \\ \sin( \theta ) & \cos ( \theta ) \end{bmatrix} \textbf{v}\)
\(\textbf{A} : \textbf{A}^{-1}=\textbf{A}^{T}\)
A function \(T^{-1} : \forall \textbf{x}, T^{-1}(T(\textbf{x}))= \textbf{x} \)
As a matter of convenient notation, the input of a function can be represented as a vector \(\textbf{x}\) (as inputs are essentially vectors to the point of interest anyways)
Function \(f: X^n \to Y\) with \(n\) inputs and one output.
For instance, double-variable functions are defined as \(f: X \times Y \to Z\) with two inputs \(x,y\) and one output \(f(x,y)=z\)
A method of plotting 3D graphs using 2D graphs, done by the following:
You can use \(f(x,y)=k\) to make contour plots and visualise the graph in 3D
2D map where curves represent 3D depth.
Sets representing all solutions of a function when \(z=c\)
\( { \textbf{x} \in \mathbb{R}^n : f(\textbf{x})=c } \)
\(\lim_{(x,y) \to (a,b)} f(x,y)=L\)
Let us define two generic functions \( g(x) : \lim_{x \to a} g(x) = a,h(y) : \lim_{y \to b} h(y) = b\)
\(L=\lim_{(x,y) \to (a,b)} f(x,y) \iff \forall g,h, L=\lim_{(x,y) \to (a,b)} f(g(x),h(y))\)
This states that a limit is only defined if approaching the value at all directions yields the same result
\(f \text{ is continuous } \iff \lim_{(x,y) \to (a,b)}f(x,y)=f(a,b)\)
Division by zero or function definition changes that induce 'jumps' make a function discontinuous, however this is the rigorous definition that mathematically defines this intuitive notion. See real analysis for more information.
Function \(f: X^n \to V\) with a n variables as inut and a vector \(\textbf{x} : m \times 1\) as output. The range of a vector function is known as a vector field
For multivariable functions, derivatives are taken with respect to individual variables. For instance, \(f(x,y)\) has the following partial derivatives for its respective variables:
Partial derivatives are functions describing the gradient of a multivariable function 'on one axis'/, or more formally speaking, the rate of change of only one variable.
Taking higher order derivatives of a function are still taken with respect to a chosen variable, for instance, \(f(x,y)\) has a total of 4 second partial derivatives:
Note how the partial derivatives \(f_{xy},f_{yx}\) describe the rate of change along the axis \(y=x\), \(f_{xx},f_{yy}\) describe the how fast the rate of change is in the \(x\) and \(y\) axises respectively.
Theorem asserting the symmetry of second order derivatives
\(f_{xy},f_{yx} \text{ are continous at } (x,y) \implies f_{xy} = f_{yx}\)
Functions can be modified by a magnifying sequence of functions which translates the point of interest (place one wants to magnify) to the origin, dilates the function out in all dimensions, and then translates the function back to its original point
\( f_m (x,y) = m (f( \frac{x-a}{m} + a, \frac{x-b}{m} + b ) - f(a,b)) + f(a,b) \)
\( \lim_{m \to \infty} f_m (x,y) = f_x(a,b)(x-a) + f_y(a,b)(y-b) + f(a,b) \)
One can chech whether a critical point is a maxima, minima or saddle by using a magnification that makes the output variable quadric, and hence the infinite limit of the following is used:
\( f_m (x,y) = m^2 (f( \frac{x-a}{m} + a, \frac{x-b}{m} + b ) - f(a,b)) + f(a,b) \)
A linear approximation of a multivariable function at a specific point. for instance at point \(\textbf{x}\):
\(z-f(x_{0},y_{0})=f_{x}(x_{0},y_{0})(x-x_{0})+f_{y}(x_{0},y_{0})(y-y_{0})\)
For a multivariable function \(f\), the vector normal to the surface can be derived by rearranging the tangent plane formula:
\(\textbf{n} = \begin{pmatrix} -f_{x_1}(a,b) \\ -f_{x_2}(a,b) \\ \vdots \\ 1 \end{pmatrix}\)
An alternate expression is
\(\textbf{n} = \nabla F : F(\textbf{x},y) = y-f(\textbf{x})\)
Note how when \(y=f(\textbf{x})\) we have the level set \(F=0\), so \(\nabla F(\textbf{x},f(\textbf{x})\) is normal to the level set \(F=0\)
To approximate a value of a function for sufficiently small \(\boldsymbol{\Delta}\textbf{x}\)
\(f(\textbf{x}+ \boldsymbol{\Delta}\textbf{x}) \approx f(\textbf{x}) + \sum_{j=1}^{n} f_{x_j}(\textbf{x})\Delta x_{j} \)
\( \lim_{m \to \infty} f_m (x,y) = f_x(a,b)(x-a) + f_y(a,b)(y-b) + f(a,b) \)
\(\frac{\partial f}{\partial t}=\frac{\partial f}{\partial x}\frac{\partial x}{\partial t}+\frac{\partial f}{\partial y}\frac{\partial y}{\partial t}\)
\(\frac{\partial f}{\partial t}= \sum_{i=1}^{n} \frac{\partial f}{\partial x_i}\frac{\partial x_i}{\partial t}\)
Since the unrigorous differential is immediately derived from the its rigorous counterpart (the chain rule), differentials for multivariate functions are calculated by the total differential
\(\Delta f \approx \sum_{i=1}^{n} \frac{\partial f}{\partial x_i} \Delta x_i\)
\( df = \sum_{i=1}^{n} \frac{\partial f}{\partial x_i} d x_i\)
Vector pointing towards the greatest increase of a function at a designated point
\(\displaystyle \nabla f(x,y) = \begin{pmatrix} \frac{\partial f }{\partial x} \\ \frac{\partial f}{\partial y} \end{pmatrix}\)
\(\nabla f(x,y) = f_x \hat{i} + f_y \hat{j}\)
One can use the multivariable chain rule to prove the following directional derivative for cylindrical coordinates
\(\nabla f(r,\theta) = f_r \hat{r} + \frac{f_{\theta}}{r} \hat{\theta}\)
To find the derivative at a point \(\textbf{x}\) in the direction of a given unit vector \( \hat{u}\), one uses the directional derivative which employs unit vectors in the newton quotient to find the rate of change in any desired direction. The result is a scalar representing the gradient in the direction of \(\hat{u}\)
\(D_{\hat{u}}f(x,y)=\lim_{t \to 0^{+}}\frac{f(\textbf{x}+t\hat{u})-f(\textbf{x})}{t}\)
One can prove different forms of this definition using geometry and calculus:
\(D_{\hat{u}}f(x,y)=\frac{\partial f}{\partial x}\hat{u}_x + \frac{\partial f}{\partial y}\hat{u}_y\)
\(D_{\hat{u}}f(\textbf{x})= \sum^{n}_{j=1}\hat{u}_j f_{x_j} \)
\(D_{\hat{u}}f(x,y)=\nabla f \cdot \hat{u}\)
\(D_{\hat{u}}f(x,y)=|\nabla f| \cos (\theta)\)
Largest attainable directional derivative which occurs when \(\textbf{u}\) has the same direction to \(\nabla f \), hence when \(D_{\textbf{u}}f = |\nabla f(x)|\)
Smallest attainable directional derivative which occurs when \(\textbf{u}\) has the opposite direction to \(\nabla f \), hence when \(D_{\textbf{u}}f = -|\nabla f(x)|\)
To determine the concavity at a critical point, one can use the following:
\( \lim_{m \to \infty} f_m (x,y) = \frac{1}{2} \begin{bmatrix} x-a & y-b\end{bmatrix} \begin{bmatrix} f_{xx}(a,b) & f_{xy}(a,b) \\ f_{yx}(a,b) & f_{yy}(a,b) \end{bmatrix} \begin{bmatrix} x-a \\ y-b \end{bmatrix} + f(a,b) \)
This result can be found by approximating a 2-variable function by a 2nd order Taylor series and arranging into matrix form (this shows some reasoning behind the Hessian matrix).
Matrix relating to the second derivatives of a multivariate function
\(H =\begin{bmatrix} \frac{\partial^2 f}{\partial x_{1}^{2}} & \frac{\partial^2 f}{\partial x_1 x_2} & \cdots & \frac{\partial^2 f}{\partial x_1 x_n} \\ \frac{\partial^2 f}{\partial x_2 x_1} & \frac{\partial^2 f}{\partial x_{2}^{2}} & \cdots & \frac{\partial^2 f}{\partial x_2 x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x_n x_1} & \frac{\partial^2 f}{\partial x_{n} x_2} & \cdots & \frac{\partial^2 f}{\partial x_{n}^2 } \end{bmatrix}\)
Under quadratic approximation, The function around the critical point translated to the origin has the form \( \frac{1}{2} \begin{bmatrix} x & y \end{bmatrix} H \begin{bmatrix} x \\ y \end{bmatrix} = \frac{1}{2} \begin{bmatrix} x' & y' \end{bmatrix}R( \theta) H R( -\theta) \begin{bmatrix} x' \\ y' \end{bmatrix} \) (where \(H\) is the Hessian matrix) and one sees that despite the rotated coordinates \( \text{tr} (H)\) and \( \text{det} (H)\) do not depend on \(\theta\) and are hence the same under rotation; this resistance of the determinant under rotation is why the Hessian matrix's determinant reveals the characteristics of critical points. For instance
A restricted domain of a function, for instance \(f(x,y)=\sqrt{xy}\) has a natural domain \(D= \{ (x,y) : xy \geq 0 \}\) but a constraint domain could be applied, making the new domain \(D = \{ (x,y) : xy \geq 0 \land x\geq y \}\)
Finding critical points of a function under restricted domains can sometimes be solved algebraically by composition of functions, however when unfeasible to do so, Lagrange multipliers are useful.
Critical points of \(f\) on constraint function \(g=0\) occurs when \(\nabla f\) exhibits no change in the direction of \(g\) (hence \(\nabla f\) is perpendicular to \(g\), or equivalently \(\nabla f\) is parallel to \(\nabla g\))
\(\nabla f =\lambda \nabla g\)
\(\int_{c}^{d}\int_{a}^{b}f(x,y)dxdy=\text{lim}_{m,n \to \infty}\sum^{m}_{i=1}\sum^{n}_{j=1}f(x_{i},y_{j})\Delta x \Delta y\)
\(\Delta x=\frac{b-a}{n}\)
\(\Delta y=\frac{d-c}{m}\)
\(x_{i}=a+i\Delta x\)
\(y_{j}=c+j\Delta y\)
Single variable calculus may have an integral \( \int_{D} \)
Domains in \(n\) variable calculus can become quite complex rather than of the standard form \([a,b] : a,b \in \mathbb{R} \) but rather can be expressed as an \(n\) variable inequality such as \(D = \{ (x,y) : x^2 + y^2 \leq 9\}\) or as the cartesian product of two real subsets \(D = [a,b] \times [c,d]\). To ease notation for multivariable variable calculus, one can write \( \iint_{D} f(x)dxdy \)
The act of integrating in one dimension while keeping the other variable constant and integrating the result for the next dimension
\(\int_{c}^{d}\int_{a}^{b}f(x)dxdy=\int_{c}^{d}[\int_{a}^{b}f(x)dx]dy\)
\(\int_{b}^{a}\int_{g_{u}(x)}^{g_{l}(x)}f(x,y)dydx=\int_{d}^{c}\int_{h_{u}(y)}^{h_{i}(y)}f(x,y)dxdy\)
\(\int_{X} (\int_{Y}f(x,y) dy) dx =\int_{Y} (\int_{X} f(x,y) dx) dy \iff \iint_{X \times Y} |f(x,y)|d(x,y) \lt \infty\)
\(\int^{d}_{c}\int^{b}_{a}f(x)g(y)dxdy = \int^{d}_{c}g(y)dy \int^{b}_{a}f(x)dx\)
\(\displaystyle A =\iint_{D} dA\)
\(\displaystyle V =\iiint_{D} dV\)
\(\displaystyle \overline{f} = \frac{1}{A} \iint_{D} f(x,y) dA\)
\(\displaystyle \overline{f} = \frac{1}{V} \iiint_{D} f(x,y,z) dV\)
\(\displaystyle \begin{pmatrix} \overline{x} \\ \overline{y} \end{pmatrix} = ( \frac{1}{A} \iint_{D} x dA ) \hat{i} + ( \frac{1}{A} \iint_{D} y dA ) \hat{j} \)
\(\displaystyle \begin{pmatrix} \overline{x} \\ \overline{y} \\ \overline{z} \end{pmatrix} = ( \frac{1}{V} \iiint_{D} x dV ) \hat{i} + ( \frac{1}{V} \iiint_{D} y dV ) \hat{j} + ( \frac{1}{V} \iiint_{D} z dV ) \hat{k} \)
Ordered pair \( (x,y) \) of a horizontal value \(x\) and vertical value \(y\) to represent a point in a 2D space
Ordered 3-tuple \( (x,y,z) \) of a depth value \(x\) horizontal value \(y\) and a vertical value \(z\) to represent a point in a 3D space
Ordered 3-tuple \( (r,\theta,z) \) of an angle from x-axis \(\theta\) a distance from the z-axis (modulus) \(r\) and a vertical value \(z\) to represent a point in a 3D space
Ordered 3-tuple \( (\rho,\theta,\phi) \) of a horizontal angle (azimuth) \(\phi\), vertical angle (altitude) \(\theta\) and a distance from the origin (modulus) \(r\) to represent a point in a 3D space
Ordered pair \((r,\theta)\) of a modulus \(r\) and a verticle angle (inclination) \(\theta\) to represent a point in a 2D space
\(\displaystyle \iint_D f(x,y)dydx = \int_{\theta_{1}}^{\theta_{2}}\int f(r\cos (\theta),r\sin (\theta))rdrd\theta\)
System of variables used to define a set of points in a space
Each coordinate system has a set of equations to translate points in another system to said coordinate system
Each coordinate system has an orthonormal basis relative to a point in space that represents any vector from that point, e see Linear Algebra
Note that some orthonormal basises may be dependent on some \(\theta\) or \(\phi\) relational to the vector's base from the origin
Ordered 3-tuple \( (x,y,z) \) that represents a point in a 3D space
\( dV = dxdydz \)
Ordered pair \( (r,\theta) \) that represents any point and vector in a 2D space
\( dA = rdrd\theta \)
Ordered 3-tuple \( (r,\theta,z) \) that represents any point and vector in a 3D space
\( dV = rdrd\theta dz \)
Ordered 3-tuple \( (\rho,\theta,\phi) \) that represents any point and vector in a 3D space
\( dV = \rho^2 \sin ( \theta ) d\rho d\theta d\phi \)
Matrix relating to the derivatives of a vector valued function
\( J = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_2}{\partial x_1} & \dots & \frac{\partial f_n}{\partial x_1} \\ \frac{\partial f_1}{\partial x_2} & \frac{\partial f_2}{\partial x_2} & \dots & \frac{\partial f_n}{\partial x_2} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial f_1}{\partial x_n} & \frac{\partial f_2}{\partial x_n} & \dots & \frac{\partial f_n}{\partial x_n} \end{bmatrix}\)
For substitution of multiple variables, the integral differential must adequately reflect the infinitesimal change. This can be directly resolved geometrically, or formally through the determinant of the Jacobian matrix (determinants have a geometric of quantifying the space bound by the origin, matrix's columns, and the sum of each pair of the matrix's columns
\(dxdy= \begin{vmatrix} \frac{\partial x}{\partial s} & \frac{\partial x}{\partial t} \\ \frac{\partial y}{\partial s} & \frac{\partial y}{\partial t} \end{vmatrix} dsdt\)
\(\prod_{i=1}^n dx_i = \text{det}(J) \prod_{i=1}^n ds_i \)
The science of collecting and analysing data, with two primary models:
Variation from an expected model in a modelling equation, represented with \(\varepsilon\)
Information of a variable from a sample, can be sumamrised numerically and graphically
Set of all items of interest of an experiment
Subset of the population
Quantity numerically evaluated from population, often represented through greek letters
Quantity numerically evaluated from a sample, interpreted as random outcomes; often represented through latin letters
Statistic that estimates a parameter
Measures of what results are the most common, average, or otherwise just 'central' values of a variable
Parameter representing expected value of a population
\(\mu = \frac{\sum_{i=1}^{N} X_{i}}{N}\)
Statistic that estimates the mean
\(\bar{X} = \frac{\sum_{i=1}^{n} x_{i}}{n}\)
The center value in a set of observations in ascending order, less prone to skewiness than the mean
the \(\frac{(n+1)}{2}^{\text{th}}\) value in said ordered set when the amount of elements is an odd number.
When there is an even number of elements, it is the average of the \(\frac{n}{2}^{\text{th}}\) and the \(\frac{(n)}{2}+1^{\text{th}}\) values.
Value(s) with highest frequency in a set, unaffected by extreme values
Difference between the highest value and lowest in a dataset, it ignores data distribution
\(\text{Range} = \max (X_n)-\min (X_n)\)
Parameter representing the mean square distance of all population elements from the mean
\( \sigma^2 = \frac{\sum_{i=1}^{n} (X_{i}-\mu)^{2}}{n} \)
Statistic for the variance
\( s^2 = \frac{\sum_{i=1}^{n} (X_{i}-\bar{X})^{2}}{n-1} \)
Square root of variance, the mean Pythagorean distance of all population elements from the mean
Parameter representing the variation between two RVs
\( \text{cov}(X,Y) = \frac{\sum_{i=1}^{n} (X_{i}-\mu_{X})(Y_{i} - \mu_{Y})}{n} \)
Statistic estimating the covariance
\( q_{X,Y} = \frac{\sum_{i=1}^{n} (X_{i}-\bar{X})(Y_{i} - \bar{Y})}{n-1} \)
Three values (\(Q_1\), \(Q_2\), \(Q_3\); where \(Q_2\) is the median) that partitions the ordered data into 4 quarters. The position of a quartile can be found with the following formula, with \(n\) representing the amount of elements in the dataset. A quartile may not always be in \(\mathbb{Z}\), so round it to the nearest integer
\(Q_{i}=\frac{i(n+1)}{4}\)
Measure of spread that is less sensitive to outliers than the standard deviation, but more sensitive to distribution than the range
\(\text{IQR}=Q_{3}-Q_{1}\)
The percent of data entries that are less than or equal to a specific entry, for instance, the mean is 50%, \(Q_1\) is 25%
When two variables have a relation
Amount of standard deviations a value is from the mean. Such a 'score' allows for ease of comparison between different distributions
\(Z=\frac{X-\bar{X}}{s}\)
\(Z=\frac{X-\mu }{\sigma }\)
Process to find outcome when outcome is uncertain
This subject requires the following probability functions that can be accessed at Probability and Random Variables
The probability of one event given that another event is guaranteed to occur. The symbol \(|\) in probability statements represents this.
This subject requires the following distributions that can be accessed at Probability and Random Variables
When random variables for two independent experiments are available, you can combine these experiments and their statistics into one single experiment. \(a_{i}\) is some weight constant such that \(\sum_{i=1}^{n} a_{i} = 1\) (in many cases, all the \(a_{i}\) are just reciprocals of n)
\(Y=\sum_{i=1}^{n}a_{i}X_{i}\)
\(\text{E}(Y)=\sum_{i=1}^{n}a_{i}\mu_{i}\)
\(\text{Var}(Y)=\sum_{i=1}^{n}a_{i}^2\sigma_{i}^2\)
Measure of the 'tailedness' of a random variable, that is, the distribution of values as one deviates from the mean
\(\text{Kurt}(X) = \frac{\mu_{4}}{\sigma^4} = \text{E}[ (\frac{X-\text{E}(X)}{\sqrt{\text{Var}(X)}})^4 ]\)
Measure of the level of asymmetry of probability density/mass around a random variable's mean
\(\text{Skew}(X) = \frac{\mu_{3}}{\sigma^3} = \text{E}[ (\frac{X-\text{E}(X)}{\sqrt{\text{Var}(X)}})^3 ]\)
A collection of random variables with the same distribution
See mathematical statistics
Theorem that asserts that linear combinations of random variables of the same distribution converge to the normal distribution.
Using data analysis on samples to infer properties of a distribution of the population.
A pair of disjoint statements (only one statement can be true) \( ( H_0, H_1 )\) about some population parameter declaring how inference is made on the parameters of a population by stating an equality and inequality respectively,
Probability \(\alpha\) of rejecting \(H_{0}\) when it is true
\(\alpha = \text{Pr}(H_{0} \text{ is rejected} | H_{0} )\)
Probability \(\beta\) of retaining \(H_{0}\) when it is false
\(\beta = \text{Pr}(H_{0} \text{ is retained} | H_{1} ) \)
Statistic that measure of compatibility between the data and \(H_{0}\). Large test statistics infer weak compatibility with \(H_{0}\), while test statistics close to 0 infer strong compatibility with \(H_{0}\).
Interval around an estimator (sample mean,variance etc) that contains the population parameter with \( (1-\alpha) \) confidence. Note that this is not a probability; since population parameters are fixed values and not random outcomes, so sone interval either covers the population parameter or it doesn't. Instead this is merely an assertment of confidence.
\(\text{CI} = [X-b , X+b ]\)
The probability \(p\) of a type I error occuring by inference on some test statistic
\(p = \text{Pr}(H_{0} \text{ is rejected} | H_{0} )\)
\(p = \begin{cases} \text{Pr}(z \gt Z) & H_{1} : \mu \gt \mu_{0} \\ \text{Pr}(z \lt Z) & H_{1} : \mu \lt \mu_{0} \\ 2 \text{Pr}(z \gt |Z|) & H_{1} : \mu \neq \mu_{0} \end{cases}\)
Arbitrarily chosen value (often \(\alpha= 0.05\)) representing the maximum type I error a test statistic may show to 'safely' reject the null hypothesis
Note the difference between \(p\) and the level of significance \(\alpha\); \(\alpha\) is used to represent an arbitrary value representing the maximum acceptable probability of a type I error that the statistician is willing to accept, whereas \(p\) is a calculated value found by indexing a tabulated value chart backwards from the test statistic to its related type 1 error
The degree of freedom (often represented as \(\nu\) ) represents the amount of data objects that can be variable.
For instance, if \(\bar{X}=5\), then \(n-1\) data objects (where n represents the amount of data objects) are variable, while the other data object can be deduced by the other values. For T tables, the amount of values is minused by 1 and then located in the T table
There are three equivalent methods of evaluating whether to reject \(H_{0}\) once the test statistic has been calculated. one can either:
Rejection regions are used to determine when to reject \(H_{0}\) based on different statements of \(H_{1}\). Let \(Z\) be the test statistic and \(\alpha\) be the level of significance (type I error willing to accept):
Note that the use my notation \( A \hookrightarrow B\) means 'A infers B' (the truth of A suggests leniency towards the truth of B, but cannot ensure it), similar to how \(A \implies B\) means ' A implies B' (the truth of A ensures the truth of B)
Equivalently, we reject \(H_{0}\) if the P-value is less than the type I error. Let \(p\) be the P-value and \(\alpha\) be the level of significance (type I error willing to take on):
\( p \lt \alpha \hookrightarrow H_{1} \)
Note that these techniques are completely logically equivalent
\( Z \notin \text{CI} \hookrightarrow H_1\)
Test statistic to infer the population mean from a sample mean, given that the population variance is known. CLT ensures that Z-tests can be treated as standard normal RVs if the sample size is large enough
\(H_0 : \mu_0 = \mu \)
\( Z = \frac{\bar{X}-\mu_{0}}{\frac{\sigma}{\sqrt{n}}} \)
\( \text{CI} = [\bar{X} - z_{\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}}, \bar{X} + z_{\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}} ] \)
Test statistic to infer a population mean from a sample mean, given that the population variance is unknown
\(H_0 : \mu_0 = \mu \)
\( T = \frac{\bar{X}-\mu_{0}}{\frac{s}{\sqrt{n}}} \)
\( \text{CI} = [\bar{X} - t_{\frac{\alpha}{2},\nu}\frac{s}{\sqrt{n}}, \bar{X} + t_{\frac{\alpha}{2},\nu}\frac{s}{\sqrt{n}} ] \)
\( \nu = n-1 \)
Paired data occurs when two populations have a bijection so that each element in set A has some relation to one element in set B
Test statistic to infer whether two sample means for two different populations are significantly different, given unknown population variance
\(H_0 : \mu_1 = \mu_2 \)
\(T = \frac{(\bar{X}_{2}-\bar{X}_{1})-(\mu_{2}-\mu_{1})}{\sqrt{\frac{s_{1}^2}{n_1}+\frac{s_{2}^2}{n_2}}} \)
\(s^{2}_{p}=\frac{(n_{1}-1)s_{1}^2+(n_{2}-1)s_{2}^2}{n_{1}+n_{2}-2}\)
\( \nu = \begin{cases} n_{1}+n_{2}-2 & s_1 = s_2 \\ \frac{(\frac{s^{2}_{1}}{n_1}+\frac{s^{2}_{2}}{n_2})^2}{\frac{(\frac{s^2_1}{n_1})^2}{n_1-1}+\frac{(\frac{s^2_2}{n_2})^2}{n_2-1}} & s_1 \neq s_2 \end{cases}\)
Test statistic to infer whether the ratio of two sample variances are significantly different.
\(H_0 : \sigma^2_1 = \sigma^2_2\)
\(F = \frac{\sigma^2_1}{\sigma^2_2} \)
\( \nu_j = n_j-1 \)