3-tuple \( (V,+,\cdot ) \) and a field set that is closed according to the following properties
\((V,+,\cdot ) \)
\((V,+, \cdot) \text{ is a linear space over } F \iff \textbf{v},\textbf{u} \in V \land c,d \in F \implies\)
In the following axioms and corrolaries, general linear elements will use vector notation, and a vector space will be represented merely by its set for conciseness
A set \(H\) is a subspace of \(V\) \(H \leq V\) iff it is a subset \(H \subseteq V\) such that the elements of \(H\) also form a linear space. Intuitively one can see that \(H\) must contain the same identity element as \(V\), must be closed under addition and scalar multiplication
The subspace made by the intersection of two subspace sets \(U \cap W \leq V\). This is a closed operation since because all elements belong to \(U,W\), and both these subspaces are closed under addition, all elements \(U \cap W\) are closed under addition and \(U,W\) are already closed under multiplication
The subspace made by the addition of any two subspace elements \(U \oplus W \leq V\). This is a closed operation since by definition it is closed under addition, and \(U,W\) are already closed under multiplication
The subspace made by the union of two subspace sets \(U \cup W \leq V\). Note that this operation is not closed; addition of two elements not in the union of these subspaces is not necessarily closed.
Also called null space, the set of all vectors that when parsed through \(T\) return \(\textbf{0}\)
\( \text{ker}(T) = \{ \textbf{v} \in V : T(\textbf{v})=\textbf{0} \} = T^{-1}(\textbf{0}) \)
Or when considering just a matrix \(A\), the set of all vectors that when multiplied with \(A\) return \(\textbf{0}\)
\( \text{nul}(\textbf{A}) = \{ \textbf{v} \in V : \textbf{A}\textbf{v} = \textbf{0} \} \)
\( \text{nul} (\textbf{A}) \leq \mathbb{R}^n \)
Set of all vectors that can be formed by a linear combination of a matrix's columns.
\( \text{col} (\textbf{A}) = \text{span} \{\textbf{a}_1,\textbf{a}_2,...,\textbf{a}_n\} \)
\( \text{col} (\textbf{A}) \leq \mathbb{R}^m \)
Set of all vectors that can be formed by a linear combination of a matrix's rows.
\( \text{row} (\textbf{A}) = \text{col}(\textbf{A}^{T})\)
Note that the columns of \(\textbf{A}\) when reduced to REF can also form a basis for \(\textbf{A}\), since row operationsin its essence just allow manipulation of linear combinations for rows
If \(\textbf{B}\) is \(\textbf{A}\) in EF, then the non-zero rows of \(\textbf{B}\) are the basis for the row space of \(\textbf{A}\) and \(\textbf{B}\)
The amount of linearly independent columns of a matrix (get in EF and count columns with pivots), or formally, the dimension of the column space of \(\textbf{A}\)
\( \text{rank} (\textbf{A}) = \text{dim}(\text{col}(\textbf{A}))\)
For a matrix \(\textbf{A}\) with dimensions \(m \times n\):
For the first statement, assume some matrix in EF \(\textbf{A}\) with \(p\) pivots; we know that \(\text{rank} (\textbf{A}) = \text{dim}(\text{col}(\textbf{A})) = p\). Consider \(\textbf{A}^{T}\), now rows are translated into columns and the pivot points are still preserved, proving this statement
For the second, \( \text{rank}(\textbf{A}) \) can be interpreted as the amount of columns (or rows) with pivots, and \(\text{dim}(\text{nul}(\textbf{A}))\) as the amount of free variables needed for a solution (how many columns without pivots). Since a column either has pivots or doesn't, the sum of these two types of columns returns the amount of columns in \(\textbf{A}\).
A linear object such that it is composed of a sum of a subset of linear objects in \(V\) each of which may be multiplied by a scalar
\( \textbf{y} = \sum_{i=1}^n c_i \textbf{v}_i\)
\( \textbf{A}\textbf{x}=\textbf{b} \iff \sum_{i=1}^{n} \textbf{x}_i \textbf{a}_i = \textbf{b}\)
\(\textbf{A}\) is a matrix where column \(i\) is \(\textbf{a}_i\). This \(\textbf{A}\) will be used throughout this entire page
The proof is by comparing the process of matrix multiplication to linear combination
Set of all linear combinations formed by elements from a subset of a linear space. Linear spans are always a subspace
\(\text{span} (U) = \{ \textbf{y} : \exists c_i \in \mathbb{C}, \textbf{y} = \sum_{i=1}^{n} c_i \cdot \textbf{u}_i \} \)
\(\text{span} (\textbf{A}) = \{ \textbf{y} : \exists c_i \in \mathbb{C}, \textbf{y} = \sum_{i=1}^{n} c_i \cdot \textbf{a}_i \} \)
Since \(0 \cdot \textbf{v}=\textbf{0}\), the zero element can be created. Since each vector in the linear combination can have any scalar, spans are closed under addition because \(c \cdot \textbf{v} + d \cdot \textbf{v} = (c+d) \cdot \textbf{v}\) and closed under scalar multiplication as \( c(d\textbf{v}) = (cd)\textbf{v} \).
\( \forall \textbf{b}, \exists \textbf{x} : A\textbf{x}=\textbf{b} \iff \text{span} \{ \text{Columns of } A \} = \mathbb{R}^n \)
As a shortcut, this condition is satisfies for \(A\) when there are pivot points in all rows (if there is no pivot point, it is a row of all zeroes, hence some variable will always be zero).
For some geometric intuition, \(\text{span} \{\textbf{v} : \textbf{v} \in \mathbb{R}^3 \} = c \cdot \textbf{v}\) is a three dimensional line crossing the origin. \(\text{span} \{\textbf{v},\textbf{u} : \textbf{v},\textbf{u} \in \mathbb{R}^3 \land \nexists c : c \cdot \textbf{v} =\textbf{u} \} \) is a plane that meets the origin and is parallel with both vectors
A subset \(U \subset V\) is linearly independent iff the linear combination only equals the zero element through the trivial case (all scalars set to zero) lack of linear independence is linear dependence.
\(U \text{ is linearly independent } \iff \forall c_i \in \mathbb{C}, \sum_{i=1}^{n} c_i\textbf{u}_i \neq 0 : \textbf{c} \neq \textbf{0} \)
As a shortcut, this condition is satisfies for \(A\) when there are pivot points in all columns (some vector can be arbitrarily added to the the vector to form numbers that aren't a multiple of the vector). The lack of linear independence is linear dependence.
Solutions to \(x^2=-1\) has no real solutions, yet has the complex solutions \(x = \pm i\). Interestingly, \(X^2=-I\) in the 2*2 case has the solution \(X = \pm i I \) but also solutions using only real entries; \(X : x_{11} = - x_{22} \land x_{11}^2 +x_{12}x_{21} = -1 \)
A set \(\mathcal{B}\) of vectors that are linearly independent and have a span equal to the linear space (therefore all vectors can be represented through them)
\(\mathcal{B} \text { is a basis for } V \iff \mathcal{B} \text{ is linearly independent} \land \text{span}(\mathcal{B})=V\)
It can be thought of as a minimum spanning set
Matrix \(P_{\mathcal{B}}\) with each basis element as a column
Cardinality of a linear space's basis
\( \text{dim}(V) = |\mathcal{B}|\)
A standard basis \( \mathcal{E} \) is one such that all components equal 0 except for one component, which is set as 1.
For instance, the standard basis of \(\mathbb{R}^2\) is \( \mathcal{E} = \{ \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \begin{bmatrix} 0 \\ 1 \end{bmatrix} \}\)
Basis for a linear space that is most intuitive
Suppose \(b_i\) are column vectors of the basis matrix \(B\) with all other slots zeroed out, and \(c_i\) are the basis constants represented by a columnvector \(\textbf{c}\), then for any \(\textbf{v} \in V\):
For some subset \(S\) of the linear space set \(V\) and some span \(H=\text{span}(S)\), then if an element of \(S\) is a linear combination of other elements \(\textbf{v}_i\), removing that element gives the same span \(\text{span}(S-\textbf{v}_i) = \text{span}(S) \)
For linear space \(V : \text{dim}(V)=n\), any linearly independent subset of \(n\) elements can form its basis, or equivalently, any \(n\) elements spanning the whole set creates the basis
Structure preserving, bijective map two linear models \(V \leftrightarrow W\) such that:
\( V \leftrightarrow W \iff ( \textbf{v}_i \leftrightarrow \textbf{w}_i \iff c \textbf{v}_1 + d \textbf{v}_2 \leftrightarrow c \textbf{w}_1 + d \textbf{w}_2 ) \)
\(V \leftrightarrow W \iff \text{dim}(V) = \text{dim}(W) \)
For some basis \(\mathcal{B}\), a coordinate \( [\textbf{v}]_{\mathcal{B}} \) is the vector of scalars that are used with the basis to form vector \(\textbf{v} \in V\)
\( \textbf{v} = \sum_{i=1}^{n} c_i \textbf{b}_i = P_{\mathcal{B}} \textbf{c} \iff [\textbf{v}]_{\mathcal{B}} = \begin{bmatrix} c_1 \\ c_2 \\ \vdots \\ c_n \end{bmatrix} \)
Where \(\textbf{x}\) is a vector and \(P_{\mathcal{B}}\) is a matrix with each basis vector as each column:
\(P_{\mathcal{B}} : \textbf{x} = P_{\mathcal{B}} \times [\textbf{x}]_{\mathcal{B}}\)
Each column of \(P_{\mathcal{B}}\) is a basis element. the reasoning behind this is same line of reasoning as to the "linear combination equivalence" discussed above; by viewing the process of matrix multiplication, for the standard-coordinate \(i\) one would like the sum of the element \(i\) in the basis multiplied by the appropriate\(\mathcal{B}\)-coordinate. Making each basis element a column is the way this is achieved
Coordinates defined by the standard basis \(\mathcal{E}\)
\([\textbf{x}]_{\mathcal{E}}: P_{\mathcal{E}} \begin{bmatrix} 1 & 0 & \ldots & 0 \\ 0 & 1 & \ldots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \ldots & 1 \end{bmatrix}\)
\([\textbf{x}]_{\mathcal{E}} = \textbf{x}\)
For two basis \(\mathcal{B},\mathcal{C}\) of linear space \(V\), there exist two transformation matrixes such that:
Note that a bit of matrix algebra shows \( P_{\mathcal{B} \to \mathcal{C}} = ( P_{\mathcal{C} \to \mathcal{B}} )^{-1} \)
Since \(P_{\mathcal{B} \to \mathcal{C}} =P_{\mathcal{C}}^{-1} P_{\mathcal{B}}\) (first transforming from \(\mathcal{B} \to \mathcal{E} \) then \(\mathcal{E} \to \mathcal{C}\)) therefore one can augment and reduce as \( P_{\mathcal{B}}| P_{\mathcal{C}} \to P_{\mathcal{B} \to \mathcal{C}}|I \)
An elementary row operations can be represented by a \(E\) that when mutliplied to a matrix \(A\) from the left \(EA\) represents some row operation or an array of row operations on \(A\)
If a square matrix \(A\) can be reduced to echelon form without swapping rows, then it can be rewritten as the product of an upper triangular matrix \(U\) and lower triangular matrix \(L\)
\(A=LU\)
This provides an alternate algorithm to solving \(A\textbf{x}=\textbf{b}\) as \(A\textbf{x}=\textbf{b} \implies L(U\textbf{x})=\textbf{b} \implies \begin{cases} L\textbf{y} = \textbf{b} \\ U\textbf{x}=\textbf{y} \end{cases} \)
The algorithm once \(L,U\) is as follows:
Hence it runs in \(O(n)\). Gaussian elimination runs in \( O( n^3 ) \), demonstrating the power of LU factorisation.
For square matrix \(A\), use forward phase to get \(A\) into echelon form (which is also an upper triangular matrix) \(EA=U\) and record each operation used through the elementary matrix \(E\). Note that \(E\) is a lower triangular matrix, hence \(E=L^{-1}\) and \(L^{-1}A=U\)
\(A\) is LU factorisable iff all square submatrixes from the top left cornet are invertible. In practice, this is impractical to check by hand.
\(A=LU \iff \forall \Delta_k = \begin{bmatrix} a_{11} & \ldots & a_{1k} \\ \vdots & \ddots & \vdots \\ a_{k1} & \ldots & a_{kk} \end{bmatrix} : k \in {1,n}, \text{det}(\Delta_k) \neq 0 \)
Algorith based on assumption that the diagonal indexes of \(L\) are all 1. By performing matrix multiplication \(A=LU\) under assumptions \(l_{ii} = 1 \land (i \lt j \implies l_{ij} = 0)\) one can break down and see that:
\(a_{ij} = \begin{cases} u_{ij} + \sum^{i-1}_{k=1} l_{ik}u_{kj} & i \in [j,n] \\ \sum^{j}_{k=1} l_{ik}u_{kj} & i \in [0,j) \end{cases}\) and hence:
Note that the algorithm does not work for all matrixes, however permuting rows with some \(P\) mitigates this. This makes the equation \(A\textbf{x}=\textbf{b} \to PA\textbf{x}=P\textbf{b}\)
To get a strong feel for the algorithm, we must analyse why the equation for \(a_{ij}\) holds.Note that standard matrix multiplication is defined as \(a_{ij} = \sum^{n}_{k=1} l_{ik}u_{kj}\); this formula works for all matrixes, however the assumptions of an upper triangular and lower unitriangular matrix allow the simplification of the formula and importantly, the perfect degree of freedom to imply the value of missing variables!
First consider the case of upper
Algorith based on assumption that the diagonal indexes of \(U\) are all 1. By performing matrix multiplication \(A=LU\) under such assumptions \(u_{ii} = 1 \land (j \lt i \implies u_{ij} = 0)\) one can break down and see that:
\(a_{ij} = \begin{cases} l_{ij} + \sum^{j-1}_{k=1} l_{ik}u_{kj} & i \in [0,j) \\ \sum^{i}_{k=1} l_{ik}u_{kj} & i \in [j,n] \end{cases}\) and hence:
Note that the algorithm does not work for all matrixes, however permuting rows with some \(P\) mitigates this. This makes the equation \(A\textbf{x}=\textbf{b} \to PA\textbf{x}=P\textbf{b}\)
When no row swapping is required, the decomposition can also be completed as \(A=LDU\) where \(L,U\) are unitriangular and \(D\) is a diagonal matrix.
For symmetric matrixes \(A=A^{T}\), one can see that \(LU=(LU)^{T}=U^T L^T\) but since a lower triangular matrix transposed is an upper triangular matrix, one can set \(A=U^T U\)
Linear vector function \(T : V \mapsto W \) mapping elements from one linear space \(V\) (domain) to another linear space \(W\) (codomain)
\( T \text{ is a linear transformation } \iff T(k\textbf{u}_{1}+p\textbf{v}_{2})=kT(\textbf{u}_{1})+pT(\textbf{v}_{2})\)
\( T \text{ is a linear transformation } \iff \)
The mapped codomain element for some domain element, so \(T(\textbf{x}) = \textbf{y}\) implies \(\textbf{y}\) is the image of \(\textbf{x}\)
\(A\textbf{x}\) can easily be proven to be a linear transform with a range of \(\text{span} \{ \text{Columns of } A\}\) since if a column of \(A\) is denoted as \(a_i\), \(A\textbf{x} = \sum^{n}_{i=1} a_i x_i\)
Linear transform between topological linear spaces such that there exists a scalar such that every domain vector maps to a vector less than the domain vector scaled by that scalar
\( \text{A} \text{ is a bounded linear operator } \iff \exists k \in \mathbb{R} [ \forall \mathbf{v} \in V \| \text{A} \mathbf{v} \| \leq k \| \mathbf{v} \| ] \)
Smallest scalar that bounds a bounded linear operator
\(\| \text{A} \|_{\mathrm{op}} = \sup_{\mathbf{v} \in V} (\frac{ \| \mathrm{A} \mathbf{v} \|}{ \| \mathbf{v} \| } ) \)
\( (H, \langle \cdot \rangle ) \land M \text{ is a closed connected set in } H \implies \forall x \in H ( \exists ! y_0 \in M : \| x-y_0 \| = \inf_{y_0 \in M} \| x -y_0 |\ ) \)
\( (W, \langle \cdot \rangle ) \leq (V, \langle \cdot \rangle ) \implies \forall \mathbr{v} \in V [ \exists ! \mathbf{w}_0 \in W [ \forall \mathbf{w} \in W [ \| \mathbr{v} - \mathbr{w}_0 \| \leq \| \mathbr{v} - \mathbr{w} \| ]]] \)
\( (W, \langle \cdot \rangle ) \land M \subset W \text{ is a closed and connected set } \implies \forall \mathbr{v} \in V [ \exists ! \mathbf{w}_0 \in W [ \forall \mathbf{w} \in W [ \| \mathbr{v} - \mathbr{w}_0 \| \leq \| \mathbr{v} - \mathbr{w} \| ]]] \)
\( (W, \langle \cdot \rangle ) \leq (V, \langle \cdot \rangle ) \implies \forall \mathbr{v} \in V [ \exists ! \mathbf{w}_0 \in W [ \forall \mathbf{w} \in W^{\perp} [ \mathbf{v} = \mathbf{w}_0 + \mathbf{w} ]]] \)
Set of all images
\( \text{im}(T) = \{ \textbf{v} : \exists \textbf{u} : T(\textbf{u}) = \textbf{v}\} \)
For the standard basis \(\mathcal{E}\) with basis matrix \(I\), one can see that \(T(\textbf{x}) = \sum x_i T(I_i)\). Following from this logic, let a standard matrix be a matrix where each column is a transformation of each column of the basis matrix
Transform that 'stretches' components proportionally, formally following this definition:
\( T \text{ is a pure scaling transformation} \iff \exists k : T(\textbf{u})=k \cdot \textbf{u}\)
\( D_k = \begin{bmatrix} k & 0 \\ 0 & k \end{bmatrix}\)
Rotating a vector by a specific angle or axis while maintianing the vector magnitude
\( R_\theta = \begin{bmatrix} \cos (\theta) & -\sin (\theta) \\ \sin (\theta) & \cos (\theta) \end{bmatrix}\)
Let \(\textbf{v} = \begin{bmatrix} r \cos (k) \\ r\sin (k) \end{bmatrix}\), now to rotate by \(\theta\) implies \( T( \textbf{v}) = \begin{bmatrix} r \cos (k + \theta) \\ r \sin (k + \theta) \end{bmatrix} = \begin{bmatrix} r ( \cos (k) \cos ( \theta ) - \sin(k) \sin ( \theta )) \\ r( \sin (k) \cos ( \theta) + \sin ( \theta ) \cos ( k )) \end{bmatrix} = r \cos (k) \begin{bmatrix} \cos(\theta) \\ \sin(\theta) \end{bmatrix} + r \sin (k) \begin{bmatrix} - \sin (\theta) \\ \cos(\theta) \end{bmatrix}= \begin{bmatrix} \cos ( \theta ) & - \sin ( \theta ) \\ \sin( \theta ) & \cos ( \theta ) \end{bmatrix} \textbf{v}\)
To rotate \( \begin{bmatrix} 1 \\ 0 \end{bmatrix} \) by \( \theta \), geometrically viewing this on a plane and reasoning determines the form \( \begin{bmatrix} \cos (\theta) \\ \sin (\theta) \end{bmatrix} \). Similarly for \( \begin{bmatrix} 0 \\ 1 \end{bmatrix} \), it transforms to \( \begin{bmatrix} -\sin(\theta) \\ \cos(\theta) \end{bmatrix} \)
\(R_\theta = \begin{bmatrix} T(1,0) & T(0,1)\end{bmatrix} = \begin{bmatrix} \cos (\theta) & -\sin (\theta) \\ \sin (\theta) & \cos (\theta) \end{bmatrix}\)
Reflecting a vector on an axis while maintaining the vector magnitude
Stretching a vector on an axis
\( S_k = \begin{bmatrix} 1 & k \\ 0 & 1 \end{bmatrix}\)
A transformation is injective iff every image has one unique mapping that no other image is mapped to, or alternatively:
\(T \text{ is injective } \iff \nexists \textbf{v} \in V - \textbf{0} : T(\textbf{v}) = \textbf{0}\)
Notice that this is the same condition as linear independence of the columns
By WOC allow \( T(\textbf{u}) = T(\textbf{v})=\textbf{w} \land \textbf{u} \neq \textbf{v}\), therefore \( T( \textbf{u} ) -T( \textbf{v}) = T(\textbf{u}-\textbf{v}) = \textbf{0}\), hence \(\text{ker}(T) \gt 1\), therefore injectivity implies no non-trivial solutions. The converse is proved similarly.
For a linear transform \(T(\textbf{x})=A\textbf{x}\) and and an eigenvalue \(\lambda\) \(E_{\lambda} = \{ \textbf{x} : (A - \lambda I) \textbf{x} = \textbf{0} \} = \text{nul}(A-\lambda I) \), then \(\text{span}(E_{\lambda})\) is one of the eigenspaces of \(T\)
Binary function returning a scalar related to the orthogonality of the two linear elements
Unary function returning the magnitude of an element in a linear space
\( \| \textbf{v} \| \geq 0\)
\( \| \textbf{v} \| = 0 \iff \textbf{v} =\textbf{0} \)
\( \| \textbf{v} + \textbf{u} \| \leq \| \textbf{v} \| + \| \textbf{u} \| \)
\( \| c\textbf{v} \| = |c| \| \textbf{v} \|\)
Finite dimension inner product space on \(\mathbb{R}\)
\( (\mathbb{R}^n, \cdot ) \text{ is a Euclidean vector space } \iff n \lt \infty \land \cdot \text{ is an inner product on } \mathbb{R}^n \)
Binary operator on two linear elements
Let \( \cdot \) be the scalar product operator, \({\textbf{x}, \textbf{y}, \textbf{z}} \in V\) be linear elements, and \(c \in \mathbb{R}\), then:
\( \textbf{x} \cdot \textbf{y} = \sum_{i=1}^{n} x_i y_i = \textbf{x} \textbf{y}^{T} \)
Property generalizing perpendicularity to two elements in a linear space
\( \textbf{v} \perp \textbf{u} \iff \langle \textbf{v} , \textbf{u} \rangle = 0 \)
\( d(\textbf{x},\textbf{y}) = \| \textbf{x} - \textbf{y} \| \)
\( \|\textbf{x} + \textbf{y} \|^2 = \| \textbf{x} \|^2 + \| \textbf{y} \|^2 \iff \textbf{x} \perp \textbf{y} \)
If \(W\) is a subspace, define the orthogonal complement as the following set
\(W^{\perp} = \{ \textbf{z} : \forall \textbf{w} \in W, \textbf{z} \cdot \textbf{w}= 0 \} \)
and it is said that \(\forall \textbf{z} \in W^{\perp}, \textbf{z} \perp W\), or \(\textbf{z}\) is orthogonal to \(W\)Consider \(A\textbf{z} = \textbf{0}\), obviously \( \textbf{z} \) is in \( \text{nul} (A) \) and due to the nature of matrix multiplication, each row of \(A\) (denoted \(\textbf{a}_i\)) satisfies \(\textbf{z} \cdot \textbf{a}_i = 0\), proving the theorem.
The second theorem can be proved by substituting \(\text{dim} (W) = \text{dim} ( \text{col} (U)) \) and applying rank theorem and the first theorem listed
Set of vectors \(S\) such that all elements are orthogonal to eachother
\(S : \forall \textbf{x},\textbf{y} \in S, \textbf{x} \neq \textbf{y} \iff \textbf{x}\cdot\textbf{y} = 0\)
\( S \text{ is an orthogonal set } \implies S \text{ is linearly independent}\)
A basis \(\mathcal{U}\) that is also an orthogonal set. Finding coordinates in an orthogonal basis can be done by finding the scalar projection of the vector onto each basis element
The component of a vector \(\textbf{v} \in \mathbb{R}\) casted onto the subspace \(W\)
\( \mathcal{U} \text{ is an orthogonal basis for } W \implies \text{proj}_{W} (\textbf{v}) = \sum_{\textbf{u} \in \mathcal{U}} \frac{ \textbf{v} \cdot \textbf{u}}{ \textbf{u} \cdot \textbf{u}} \textbf{u} \)
\( \text{proj}_{W} (\textbf{v}) = \textbf{v} \iff \textbf{v} \in W\)
All elements of \(\mathbb{R}^{n}\) can be decomposed into the sum of a linear element of a subspace and and an element from that subset's orthogonal complement.
\(\forall \textbf{y} \in \mathbb{R}^{n}, \exists \textbf{w} \in W \leq \mathbb{R}^n, \textbf{z} \in W^{\perp} : \textbf{y} = \textbf{w} + \textbf{z}\)
Note that for some orthogonal basis \(\mathcal{U}\) for \(W\), \( \textbf{w} = \sum^{\text{dim(W)}}_{i=1} \text{proj}_{\text{span} \{ \textbf{b}_i\} } (\textbf{y}) \)
The distance from \(\textbf{y} \in \mathbb{R}^n\) to the closest element in subspace \(W\)
For some element \(\textbf{y}\) outside subspace \(W\), the element in \(W\) with the smallest distance is \(\text{proj}_{W} (\textbf{y})\)
\( \forall \textbf{v} \in W\setminus \{\text{proj}_{W} (\textbf{y})\}, \| \textbf{y} - \text{proj}_{W} (\textbf{y}) \| \lt \| \textbf{y} - \textbf{v} \| \)
Orthogonal set where all vectors are unit vectors
Let \(U\) be a matrix whose columns represent an orthonormal basis for \(W\). Then:
Note that if \(\textbf{y} \in W\), then the projection just returns the \(\textbf{y}\) back.
When the columns of \(U\) form an orthonormal basis for \(W\), \(\text{proj}_{W} (\textbf{y}) = \sum^{\text{dim}(W)}_{i=1} \frac{\textbf{y} \cdot \textbf{u}_i}{u_i \cdot u_i} \textbf{u}_i \) by definition, but since \(\forall i, u_i \cdot u_i = 1\) (property of orthonormal set), the equation simplifies to \(\text{proj}_{W} (\textbf{y}) = \sum^{\text{dim}(W)}_{i=1} (\textbf{y} \cdot \textbf{u}_i) \textbf{u}_i \)
Matrix multiplication shows that \( (U^{T}\textbf{y})_i = (\textbf{y} \cdot \textbf{u}_i) \) and further matrix multiplication evidently proves the theorem.
Also called an orthonormal matrix, a matrix such that the columns form an orthonormal set, and hence:
\(U \text{ is orthogonal} \iff U^{-1} = U^{T}\)
The change of coordinates from one orthogonal matrix to another is also an orthogonal matrix. This is because \(P_{\mathcal{U} \to \mathcal{Q}}^{T} P_{\mathcal{U} \to \mathcal{Q}} = (P_{\mathcal{Q}}^{T} P_{\mathcal{U}})^{T} P_{\mathcal{Q}}^{T} P_{\mathcal{U}} = P_{\mathcal{U}}^{T} P_{\mathcal{Q}} P_{\mathcal{Q}}^{T} P_{\mathcal{U}} = I\)
\(U \text{ is orthogonal } \implies |\text{det}(U)| = 1\)
Note that if all columns were merely orthogonal, \(U^{T}U\) would equal some diagonal square matrix, but an orthogonal matrix strictly requires that the diagonal matrix obtained by \(U^{T}U=I\).
Algorithm determining an orthogonal basis \(\mathcal{U}\) for some linear space given a regular basis \(\mathcal{B}\)
\(\textbf{u}_i = \textbf{b}_i - \sum^{i-1}_{j=1} \frac{\textbf{b}_i \cdot \textbf{u}_j }{\textbf{u}_j \cdot \textbf{u}_j} \textbf{u}_j \)
Let \(\mathcal{U} = \{ \textbf{b}_1 \} \) where \(\textbf{b}_1 \in \mathcal{B}\). Inductively, an orthogonal decomposition of an element of \(\mathcal{B}\) with respect to the current \(\mathcal{U}\) can be constructed as \(\textbf{b} = \textbf{w} +\textbf{z}\) (where \(\textbf{w} \in \text{span} (\mathcal{U}), \textbf{z} \in \text{span} (\mathcal{U})^{\perp} \)).
Since \(\textbf{z}\) is the necessary component needed to reach the chosen basis element of \(\mathcal{B}\) and is orthogonal to the current \(\mathcal{U}\), adding this to the orthogonal basis maintains its orthogonality and spans this new element.
A special function \(\delta : \mathbb{N} \times \mathbb{N} \to \{0,1\} \) with the following definition
\(\delta_{ij} = \begin{cases} 1 & i=j \\ 0 & i \neq j \end{cases}\)
\(A \text{ is symmetric} \iff A=A^{T}\)
Symmetric matrixes have a some nice properties and applications, from quadratic forms and orthogonal diagonalisation (as will be seen below) in addition to undirected graph adjacency matrixes
A square matrix \(A : n \times n\) may have the decomposition \( A=PDP^{-1}\)
Let \(A : n \times n\) be a square matrix, \(A \text{ is diagonalizable } \iff A \text{ has } n \text{ linearly independent eigenvectors} \)
Note that \(A=PDP^{-1} \implies AP=PD \implies AP=DP\) if \(P\) is invertible. Since \(D\) is diagonal (applying element \(d_{ii}\) as a scalar to each \(\textbf{p}_{i}\))), this is now reduced to an eigenproblem. Since \(P\) must be invertible, all columns of \(P\) must be linearly independent, proving this theorem
For a symmetric square matrix \(A : n \times n\) one can make the decomposition \( A=PDP^{T}\)
Set of a matrix's eigienvalues
\(S_{A} = \{ \lambda : \exists \textbf{x} : A\textbf{x} = \lambda \textbf{x} \}\)
Let \(A\) be a square matrix, \(A \text{ is orthogonally diagonalizable } \iff A=A^{T}\)
Orthogonal diagonisability implies symmetry since \(A^T = (PDP^{T})^{T} = PDP^{T} A\). Symmetry implies orthogonal diagonalisability since by the spectral theorem symmetric matrixes have orthogonal eigenspaces, which is the requirement for diagonalization
Let \(A\) of be a square, symmetric matrix with dimensions \(n \times n\):
If two eigenvectors of a symmetric \(A\) are from different eigenspaces, then they are orthogonal to eachother since \(\lambda_1 \textbf{x}_1 \cdot \textbf{x}_2 = (\lambda_1 \textbf{x}_1)^{T} \textbf{x}_2 = (A \textbf{x}_1)^{T} \textbf{x}_2 = (\textbf{x}_1)^{T} A^{T} \textbf{x}_2 = ( \textbf{x}_1)^{T} A \textbf{x}_2 = \lambda_2 \textbf{x}_1 \cdot \textbf{x}_2\). Since the eigenvalues are distinct \(\textbf{x}_1 \cdot \textbf{x}_2 = 0\) and the vectors are therefore orthogonal
If \(A\) is orthogonally diagonizable (symmetric), and we let \(P = \begin{bmatrix} \textbf{u}_1 & \textbf{u}_2 & ... & \textbf{u}_n \end{bmatrix}\), then there is the decomposition \(A = \sum^{n}_{i=1} \lambda_i \textbf{u}_i \textbf{u}_i^{T}\)
For all matrixes of size \(m \times n\), \(A^{T}A\) is symmetric; this allows for a general decomposition for any matrix. Let \(\lambda_1,\lambda_2,...\lambda_n\) be the eigenvalues of \(A^{T}A\) and \(U\) represent the set of normalised eigenvectors of \(A^{T} A\), then:
\(A= W \Sigma U^{T} \)
Form \(\Sigma\) by assigning the square root of each eigenvalue to its diagonal in descending order, starting at the top left index. This matrix should have the same dimensions as \(A\)
Form \(W\) by assigning its columns as \( \textbf{w}_{i} = \frac{A \textbf{u_i}}{\sigma_{i}}, 1 \leq i \leq \text{rank} (A)\) and using Gram-Schmidt to form the rest of the columns until \(m\) columns are formed
Recall that \(|A\textbf{u_i}| = \sqrt{\lambda_i} = \sigma_i\), so therefore \(A \textbf{u} = \sigma \textbf{w} \). \(U\) is an orthonormal basis and with \(W\) extended to dimension \(m\), also forms an orthonormal basis. Note that by matrix multiplication and the eigenequation, \(AU=W\Sigma \), so trivially \(A=W\Sigma U^{T}\)
\(A = \sum_{i=1}^{\text{rank}(A)} \sigma_i \textbf{w}_i \textbf{u}_{i}^{T}\)
A function \(Q : \mathbb{R}^n \to \mathbb{R} \) with the following structure
\( Q( \textbf{x} ) = \textbf{x}^{T} A \textbf{x} : A=A^{T}\)
By letting \( Q_{A}( \textbf{x} ) = \textbf{x}^{T} A \textbf{x}\) one can see that \(Q_{A} = \frac{Q_{A} + Q_{A^{T}}}{2}\) due to the symmetric constraint in quadrattic form's definition.
Note that \( Q_{A}( \textbf{x} ) = \sum_{i,j}^{n} k_{ij} x_{i}x_{j} \implies A = \begin{bmatrix} k_{11} & \frac{k_{12}}{2} & \ldots & \frac{k_{1n}}{2} \\ \frac{k_{21}}{2} & k_{22} & \ldots & \frac{k_{2n}}{2} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{k_{n1}}{2} & \frac{k_{n2}}{2} & \ldots & k_{nn} \end{bmatrix} \)
For any \(Q\), there exists some change of coordinates \(\textbf{x} = P \textbf{y}\) such that the \(A\) in the quadratic form can be expressed as a diagonal matrix \(D\)
While recalling the symmetry of \(A\), \(\textbf{x}^{T} A \textbf{x} = (P\textbf{y})^{T} A (P\textbf{y}) = \textbf{y}^{T}P^{T} A P\textbf{y} = \textbf{y}^{T}P^{T} (PDP^{-1}) P \textbf{y} = \textbf{y}^{T} D \textbf{y}\)
\(\textbf{y} = P^{T}\textbf{x}, Q(\textbf{y})=\textbf{y}^{T} D \textbf{y}\)
For a system \(A\textbf{x}=\textbf{b}\) with no solution, the least-squares solution \(\textbf{y}\) is an alternative to a solution which has the least error
\( \textbf{y} \text{ is a least-squares solution of } A\textbf{x}=\textbf{b} \iff \forall \textbf{x} \in \mathbb{R}^n , \|\textbf{b} - A\textbf{y}\| \leq \|\textbf{b} - A\textbf{x}\| \)
This can be solved by applying the best approximation theorem to calculate \( \textbf{y} : A \textbf{y} = \text{proj}_{\text{col}(A)} ( \textbf{b} ) \)
Since \( \textbf{b} - \text{proj}_{\text{col}(A)} ( \textbf{b} ) \perp \text{col}(A) \), their dot product is zero, so \(A^{T} [ \textbf{b} - \text{proj}_{\text{col}(A)} (\textbf{b})] = 0 \). Matrix algebra shows that this equals:
\(A^{T}A\textbf{y} = A^{T}\textbf{b}\)
The error \(\|\textbf{b} - A\textbf{y}\|\) with respect to the norm
A matrix with linearly independent columns can be factorized into an orthogonal matrix spanning the matrix and an upper triangular invertible matrix with a positive diagonal.
\(\textbf{A}=\textbf{Q}\textbf{R}\)
\(\textbf{A} \text{ is positive definite } \iff \forall \textbf{u} \in \mathbb{R}^n \setminus \{\textbf{0}\} ( \textbf{u} \cdot \textbf{A}\textbf{u} \gt 0 ) \)
\(\textbf{A} \text{ is positive semidefinite } \iff \forall \textbf{u} \in \mathbb{R}^n ( \textbf{u} \cdot \textbf{A}\textbf{u} \geq 0 ) \)