37233 - Linear Algebra


Linear spaces Spazi lineari 線形空間

Linear space Spazio lineare

3-tuple \( (V,+,\cdot ) \) and a field set that is closed according to the following properties

\((V,+,\cdot ) \)

\((V,+, \cdot) \text{ is a linear space over } F \iff \textbf{v},\textbf{u} \in V \land c,d \in F \implies\)

In the following axioms and corrolaries, general linear elements will use vector notation, and a vector space will be represented merely by its set for conciseness

Corrolaries

Subspace

A set \(H\) is a subspace of \(V\) \(H \leq V\) iff it is a subset \(H \subseteq V\) such that the elements of \(H\) also form a linear space. Intuitively one can see that \(H\) must contain the same identity element as \(V\), must be closed under addition and scalar multiplication

Subspace operators

Intersection

The subspace made by the intersection of two subspace sets \(U \cap W \leq V\). This is a closed operation since because all elements belong to \(U,W\), and both these subspaces are closed under addition, all elements \(U \cap W\) are closed under addition and \(U,W\) are already closed under multiplication

Sum

The subspace made by the addition of any two subspace elements \(U \oplus W \leq V\). This is a closed operation since by definition it is closed under addition, and \(U,W\) are already closed under multiplication

Union

The subspace made by the union of two subspace sets \(U \cup W \leq V\). Note that this operation is not closed; addition of two elements not in the union of these subspaces is not necessarily closed.

Kernel Spazio Nullo 零空間

Also called null space, the set of all vectors that when parsed through \(T\) return \(\textbf{0}\)

\( \text{ker}(T) = \{ \textbf{v} \in V : T(\textbf{v})=\textbf{0} \} = T^{-1}(\textbf{0}) \)

Or when considering just a matrix \(A\), the set of all vectors that when multiplied with \(A\) return \(\textbf{0}\)

\( \text{nul}(\textbf{A}) = \{ \textbf{v} \in V : \textbf{A}\textbf{v} = \textbf{0} \} \)

\( \text{nul} (\textbf{A}) \leq \mathbb{R}^n \)

Subspace proof

Subspace properties

Column space Spazio delle colonne 列空間

Set of all vectors that can be formed by a linear combination of a matrix's columns.

\( \text{col} (\textbf{A}) = \text{span} \{\textbf{a}_1,\textbf{a}_2,...,\textbf{a}_n\} \)

\( \text{col} (\textbf{A}) \leq \mathbb{R}^m \)

Subspace proof

Subspace properties

Row space Spazio delle righe 行空間

Set of all vectors that can be formed by a linear combination of a matrix's rows.

\( \text{row} (\textbf{A}) = \text{col}(\textbf{A}^{T})\)

Note that the columns of \(\textbf{A}\) when reduced to REF can also form a basis for \(\textbf{A}\), since row operationsin its essence just allow manipulation of linear combinations for rows

Subspace proof

Subspace properties

If \(\textbf{B}\) is \(\textbf{A}\) in EF, then the non-zero rows of \(\textbf{B}\) are the basis for the row space of \(\textbf{A}\) and \(\textbf{B}\)

Rank Rango 階数

The amount of linearly independent columns of a matrix (get in EF and count columns with pivots), or formally, the dimension of the column space of \(\textbf{A}\)

\( \text{rank} (\textbf{A}) = \text{dim}(\text{col}(\textbf{A}))\)

Rank-nullity theorem

For a matrix \(\textbf{A}\) with dimensions \(m \times n\):

Proof

For the first statement, assume some matrix in EF \(\textbf{A}\) with \(p\) pivots; we know that \(\text{rank} (\textbf{A}) = \text{dim}(\text{col}(\textbf{A})) = p\). Consider \(\textbf{A}^{T}\), now rows are translated into columns and the pivot points are still preserved, proving this statement

For the second, \( \text{rank}(\textbf{A}) \) can be interpreted as the amount of columns (or rows) with pivots, and \(\text{dim}(\text{nul}(\textbf{A}))\) as the amount of free variables needed for a solution (how many columns without pivots). Since a column either has pivots or doesn't, the sum of these two types of columns returns the amount of columns in \(\textbf{A}\).

Invertible matrix theorem

Fundamentals Fondamentali 基本

Linear combination Combinazione lineare 線形結合

A linear object such that it is composed of a sum of a subset of linear objects in \(V\) each of which may be multiplied by a scalar

\( \textbf{y} = \sum_{i=1}^n c_i \textbf{v}_i\)

Logical equivalence

\( \textbf{A}\textbf{x}=\textbf{b} \iff \sum_{i=1}^{n} \textbf{x}_i \textbf{a}_i = \textbf{b}\)

\(\textbf{A}\) is a matrix where column \(i\) is \(\textbf{a}_i\). This \(\textbf{A}\) will be used throughout this entire page

The proof is by comparing the process of matrix multiplication to linear combination

Linear span

Set of all linear combinations formed by elements from a subset of a linear space. Linear spans are always a subspace

\(\text{span} (U) = \{ \textbf{y} : \exists c_i \in \mathbb{C}, \textbf{y} = \sum_{i=1}^{n} c_i \cdot \textbf{u}_i \} \)

\(\text{span} (\textbf{A}) = \{ \textbf{y} : \exists c_i \in \mathbb{C}, \textbf{y} = \sum_{i=1}^{n} c_i \cdot \textbf{a}_i \} \)

Subspace proof

Since \(0 \cdot \textbf{v}=\textbf{0}\), the zero element can be created. Since each vector in the linear combination can have any scalar, spans are closed under addition because \(c \cdot \textbf{v} + d \cdot \textbf{v} = (c+d) \cdot \textbf{v}\) and closed under scalar multiplication as \( c(d\textbf{v}) = (cd)\textbf{v} \).

Theorems

\( \forall \textbf{b}, \exists \textbf{x} : A\textbf{x}=\textbf{b} \iff \text{span} \{ \text{Columns of } A \} = \mathbb{R}^n \)

As a shortcut, this condition is satisfies for \(A\) when there are pivot points in all rows (if there is no pivot point, it is a row of all zeroes, hence some variable will always be zero).

Geometric interpretation of spans

For some geometric intuition, \(\text{span} \{\textbf{v} : \textbf{v} \in \mathbb{R}^3 \} = c \cdot \textbf{v}\) is a three dimensional line crossing the origin. \(\text{span} \{\textbf{v},\textbf{u} : \textbf{v},\textbf{u} \in \mathbb{R}^3 \land \nexists c : c \cdot \textbf{v} =\textbf{u} \} \) is a plane that meets the origin and is parallel with both vectors

Linear independence

A subset \(U \subset V\) is linearly independent iff the linear combination only equals the zero element through the trivial case (all scalars set to zero) lack of linear independence is linear dependence.

\(U \text{ is linearly independent } \iff \forall c_i \in \mathbb{C}, \sum_{i=1}^{n} c_i\textbf{u}_i \neq 0 : \textbf{c} \neq \textbf{0} \)

As a shortcut, this condition is satisfies for \(A\) when there are pivot points in all columns (some vector can be arbitrarily added to the the vector to form numbers that aren't a multiple of the vector). The lack of linear independence is linear dependence.

Theorems

Complex matrix

Solutions to \(x^2=-1\) has no real solutions, yet has the complex solutions \(x = \pm i\). Interestingly, \(X^2=-I\) in the 2*2 case has the solution \(X = \pm i I \) but also solutions using only real entries; \(X : x_{11} = - x_{22} \land x_{11}^2 +x_{12}x_{21} = -1 \)

Basis

A set \(\mathcal{B}\) of vectors that are linearly independent and have a span equal to the linear space (therefore all vectors can be represented through them)

\(\mathcal{B} \text { is a basis for } V \iff \mathcal{B} \text{ is linearly independent} \land \text{span}(\mathcal{B})=V\)

It can be thought of as a minimum spanning set

Basis matrix

Matrix \(P_{\mathcal{B}}\) with each basis element as a column

Dimension Dimenzione 次元

Cardinality of a linear space's basis

\( \text{dim}(V) = |\mathcal{B}|\)

Standard basis

A standard basis \( \mathcal{E} \) is one such that all components equal 0 except for one component, which is set as 1.

For instance, the standard basis of \(\mathbb{R}^2\) is \( \mathcal{E} = \{ \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \begin{bmatrix} 0 \\ 1 \end{bmatrix} \}\)

Canonical basis

Basis for a linear space that is most intuitive

Alternate basis

Suppose \(b_i\) are column vectors of the basis matrix \(B\) with all other slots zeroed out, and \(c_i\) are the basis constants represented by a columnvector \(\textbf{c}\), then for any \(\textbf{v} \in V\):

Spanning set theorem

For some subset \(S\) of the linear space set \(V\) and some span \(H=\text{span}(S)\), then if an element of \(S\) is a linear combination of other elements \(\textbf{v}_i\), removing that element gives the same span \(\text{span}(S-\textbf{v}_i) = \text{span}(S) \)

Basis theorem

For linear space \(V : \text{dim}(V)=n\), any linearly independent subset of \(n\) elements can form its basis, or equivalently, any \(n\) elements spanning the whole set creates the basis

Linear isomorphism Isomorfismo lineare 線形同型写像

Structure preserving, bijective map two linear models \(V \leftrightarrow W\) such that:

\( V \leftrightarrow W \iff ( \textbf{v}_i \leftrightarrow \textbf{w}_i \iff c \textbf{v}_1 + d \textbf{v}_2 \leftrightarrow c \textbf{w}_1 + d \textbf{w}_2 ) \)

\(V \leftrightarrow W \iff \text{dim}(V) = \text{dim}(W) \)

Coordinates Coordinati 座標

For some basis \(\mathcal{B}\), a coordinate \( [\textbf{v}]_{\mathcal{B}} \) is the vector of scalars that are used with the basis to form vector \(\textbf{v} \in V\)

\( \textbf{v} = \sum_{i=1}^{n} c_i \textbf{b}_i = P_{\mathcal{B}} \textbf{c} \iff [\textbf{v}]_{\mathcal{B}} = \begin{bmatrix} c_1 \\ c_2 \\ \vdots \\ c_n \end{bmatrix} \)

Coordinate transform (standard coordinates)

Where \(\textbf{x}\) is a vector and \(P_{\mathcal{B}}\) is a matrix with each basis vector as each column:

\(P_{\mathcal{B}} : \textbf{x} = P_{\mathcal{B}} \times [\textbf{x}]_{\mathcal{B}}\)

Each column of \(P_{\mathcal{B}}\) is a basis element. the reasoning behind this is same line of reasoning as to the "linear combination equivalence" discussed above; by viewing the process of matrix multiplication, for the standard-coordinate \(i\) one would like the sum of the element \(i\) in the basis multiplied by the appropriate\(\mathcal{B}\)-coordinate. Making each basis element a column is the way this is achieved

Standard coordinates

Coordinates defined by the standard basis \(\mathcal{E}\)

\([\textbf{x}]_{\mathcal{E}}: P_{\mathcal{E}} \begin{bmatrix} 1 & 0 & \ldots & 0 \\ 0 & 1 & \ldots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \ldots & 1 \end{bmatrix}\)

\([\textbf{x}]_{\mathcal{E}} = \textbf{x}\)

Coordinate transform (alternate coordinates)

For two basis \(\mathcal{B},\mathcal{C}\) of linear space \(V\), there exist two transformation matrixes such that:

Note that a bit of matrix algebra shows \( P_{\mathcal{B} \to \mathcal{C}} = ( P_{\mathcal{C} \to \mathcal{B}} )^{-1} \)

Finding coordinate transform matrix

Since \(P_{\mathcal{B} \to \mathcal{C}} =P_{\mathcal{C}}^{-1} P_{\mathcal{B}}\) (first transforming from \(\mathcal{B} \to \mathcal{E} \) then \(\mathcal{E} \to \mathcal{C}\)) therefore one can augment and reduce as \( P_{\mathcal{B}}| P_{\mathcal{C}} \to P_{\mathcal{B} \to \mathcal{C}}|I \)

Factorisation and Transformations

Elementary matrixes

An elementary row operations can be represented by a \(E\) that when mutliplied to a matrix \(A\) from the left \(EA\) represents some row operation or an array of row operations on \(A\)

Inverse elementary matrixes

LU Decomposition Decomposizione LU LU分解

If a square matrix \(A\) can be reduced to echelon form without swapping rows, then it can be rewritten as the product of an upper triangular matrix \(U\) and lower triangular matrix \(L\)

\(A=LU\)

This provides an alternate algorithm to solving \(A\textbf{x}=\textbf{b}\) as \(A\textbf{x}=\textbf{b} \implies L(U\textbf{x})=\textbf{b} \implies \begin{cases} L\textbf{y} = \textbf{b} \\ U\textbf{x}=\textbf{y} \end{cases} \)

Algorithm analysis

The algorithm once \(L,U\) is as follows:

  1. Calculate \(\textbf{y} : L\textbf{y}=\textbf{b}\) (runs in \(O(n)\) due to existing pivots in \(L\))
  2. Calculate \(\textbf{x} : U\textbf{x} = \textbf{y}\) (runs in \(O(n)\) due to existing pivots in \(U\))

Hence it runs in \(O(n)\). Gaussian elimination runs in \( O( n^3 ) \), demonstrating the power of LU factorisation.

Calculating \(LU\)

For square matrix \(A\), use forward phase to get \(A\) into echelon form (which is also an upper triangular matrix) \(EA=U\) and record each operation used through the elementary matrix \(E\). Note that \(E\) is a lower triangular matrix, hence \(E=L^{-1}\) and \(L^{-1}A=U\)

Criterion

\(A\) is LU factorisable iff all square submatrixes from the top left cornet are invertible. In practice, this is impractical to check by hand.

\(A=LU \iff \forall \Delta_k = \begin{bmatrix} a_{11} & \ldots & a_{1k} \\ \vdots & \ddots & \vdots \\ a_{k1} & \ldots & a_{kk} \end{bmatrix} : k \in {1,n}, \text{det}(\Delta_k) \neq 0 \)

Doolittle's algorithm Algoritmo di Doolittle ドゥーリトル法

Algorith based on assumption that the diagonal indexes of \(L\) are all 1. By performing matrix multiplication \(A=LU\) under assumptions \(l_{ii} = 1 \land (i \lt j \implies l_{ij} = 0)\) one can break down and see that:

\(a_{ij} = \begin{cases} u_{ij} + \sum^{i-1}_{k=1} l_{ik}u_{kj} & i \in [j,n] \\ \sum^{j}_{k=1} l_{ik}u_{kj} & i \in [0,j) \end{cases}\) and hence:

Note that the algorithm does not work for all matrixes, however permuting rows with some \(P\) mitigates this. This makes the equation \(A\textbf{x}=\textbf{b} \to PA\textbf{x}=P\textbf{b}\)

Intuition

To get a strong feel for the algorithm, we must analyse why the equation for \(a_{ij}\) holds.Note that standard matrix multiplication is defined as \(a_{ij} = \sum^{n}_{k=1} l_{ik}u_{kj}\); this formula works for all matrixes, however the assumptions of an upper triangular and lower unitriangular matrix allow the simplification of the formula and importantly, the perfect degree of freedom to imply the value of missing variables!

First consider the case of upper

Crout's algorithm Algoritmo di Crout クラウト法

Algorith based on assumption that the diagonal indexes of \(U\) are all 1. By performing matrix multiplication \(A=LU\) under such assumptions \(u_{ii} = 1 \land (j \lt i \implies u_{ij} = 0)\) one can break down and see that:

\(a_{ij} = \begin{cases} l_{ij} + \sum^{j-1}_{k=1} l_{ik}u_{kj} & i \in [0,j) \\ \sum^{i}_{k=1} l_{ik}u_{kj} & i \in [j,n] \end{cases}\) and hence:

Note that the algorithm does not work for all matrixes, however permuting rows with some \(P\) mitigates this. This makes the equation \(A\textbf{x}=\textbf{b} \to PA\textbf{x}=P\textbf{b}\)

LDU factorisation

When no row swapping is required, the decomposition can also be completed as \(A=LDU\) where \(L,U\) are unitriangular and \(D\) is a diagonal matrix.

Choleski's algorithm

For symmetric matrixes \(A=A^{T}\), one can see that \(LU=(LU)^{T}=U^T L^T\) but since a lower triangular matrix transposed is an upper triangular matrix, one can set \(A=U^T U\)

Linear Transformation Trasformazione lineare 線形変換

Linear vector function \(T : V \mapsto W \) mapping elements from one linear space \(V\) (domain) to another linear space \(W\) (codomain)

\( T \text{ is a linear transformation } \iff T(k\textbf{u}_{1}+p\textbf{v}_{2})=kT(\textbf{u}_{1})+pT(\textbf{v}_{2})\)

\( T \text{ is a linear transformation } \iff \)

Image

The mapped codomain element for some domain element, so \(T(\textbf{x}) = \textbf{y}\) implies \(\textbf{y}\) is the image of \(\textbf{x}\)

Observations

\(A\textbf{x}\) can easily be proven to be a linear transform with a range of \(\text{span} \{ \text{Columns of } A\}\) since if a column of \(A\) is denoted as \(a_i\), \(A\textbf{x} = \sum^{n}_{i=1} a_i x_i\)

Bounded linear operator

Linear transform between topological linear spaces such that there exists a scalar such that every domain vector maps to a vector less than the domain vector scaled by that scalar

\( \text{A} \text{ is a bounded linear operator } \iff \exists k \in \mathbb{R} [ \forall \mathbf{v} \in V \| \text{A} \mathbf{v} \| \leq k \| \mathbf{v} \| ] \)

Operator norm

Smallest scalar that bounds a bounded linear operator

\(\| \text{A} \|_{\mathrm{op}} = \sup_{\mathbf{v} \in V} (\frac{ \| \mathrm{A} \mathbf{v} \|}{ \| \mathbf{v} \| } ) \)

Projection lemma

\( (H, \langle \cdot \rangle ) \land M \text{ is a closed connected set in } H \implies \forall x \in H ( \exists ! y_0 \in M : \| x-y_0 \| = \inf_{y_0 \in M} \| x -y_0 |\ ) \)

Projection theorem

\( (W, \langle \cdot \rangle ) \leq (V, \langle \cdot \rangle ) \implies \forall \mathbr{v} \in V [ \exists ! \mathbf{w}_0 \in W [ \forall \mathbf{w} \in W [ \| \mathbr{v} - \mathbr{w}_0 \| \leq \| \mathbr{v} - \mathbr{w} \| ]]] \)

\( (W, \langle \cdot \rangle ) \land M \subset W \text{ is a closed and connected set } \implies \forall \mathbr{v} \in V [ \exists ! \mathbf{w}_0 \in W [ \forall \mathbf{w} \in W [ \| \mathbr{v} - \mathbr{w}_0 \| \leq \| \mathbr{v} - \mathbr{w} \| ]]] \)

\( (W, \langle \cdot \rangle ) \leq (V, \langle \cdot \rangle ) \implies \forall \mathbr{v} \in V [ \exists ! \mathbf{w}_0 \in W [ \forall \mathbf{w} \in W^{\perp} [ \mathbf{v} = \mathbf{w}_0 + \mathbf{w} ]]] \)

Range Gamma 範囲

Set of all images

\( \text{im}(T) = \{ \textbf{v} : \exists \textbf{u} : T(\textbf{u}) = \textbf{v}\} \)

Standard matrix

For the standard basis \(\mathcal{E}\) with basis matrix \(I\), one can see that \(T(\textbf{x}) = \sum x_i T(I_i)\). Following from this logic, let a standard matrix be a matrix where each column is a transformation of each column of the basis matrix

Scaling transformation

Transform that 'stretches' components proportionally, formally following this definition:

\( T \text{ is a pure scaling transformation} \iff \exists k : T(\textbf{u})=k \cdot \textbf{u}\)

Standard scaling matrix

\( D_k = \begin{bmatrix} k & 0 \\ 0 & k \end{bmatrix}\)

Projecting transformation

Rotation transformation

Rotating a vector by a specific angle or axis while maintianing the vector magnitude

Standard rotation matrix

\( R_\theta = \begin{bmatrix} \cos (\theta) & -\sin (\theta) \\ \sin (\theta) & \cos (\theta) \end{bmatrix}\)

My proof

Let \(\textbf{v} = \begin{bmatrix} r \cos (k) \\ r\sin (k) \end{bmatrix}\), now to rotate by \(\theta\) implies \( T( \textbf{v}) = \begin{bmatrix} r \cos (k + \theta) \\ r \sin (k + \theta) \end{bmatrix} = \begin{bmatrix} r ( \cos (k) \cos ( \theta ) - \sin(k) \sin ( \theta )) \\ r( \sin (k) \cos ( \theta) + \sin ( \theta ) \cos ( k )) \end{bmatrix} = r \cos (k) \begin{bmatrix} \cos(\theta) \\ \sin(\theta) \end{bmatrix} + r \sin (k) \begin{bmatrix} - \sin (\theta) \\ \cos(\theta) \end{bmatrix}= \begin{bmatrix} \cos ( \theta ) & - \sin ( \theta ) \\ \sin( \theta ) & \cos ( \theta ) \end{bmatrix} \textbf{v}\)

Standard matrix proof

To rotate \( \begin{bmatrix} 1 \\ 0 \end{bmatrix} \) by \( \theta \), geometrically viewing this on a plane and reasoning determines the form \( \begin{bmatrix} \cos (\theta) \\ \sin (\theta) \end{bmatrix} \). Similarly for \( \begin{bmatrix} 0 \\ 1 \end{bmatrix} \), it transforms to \( \begin{bmatrix} -\sin(\theta) \\ \cos(\theta) \end{bmatrix} \)

\(R_\theta = \begin{bmatrix} T(1,0) & T(0,1)\end{bmatrix} = \begin{bmatrix} \cos (\theta) & -\sin (\theta) \\ \sin (\theta) & \cos (\theta) \end{bmatrix}\)

Reflection transformation

Reflecting a vector on an axis while maintaining the vector magnitude

Shear transformation

Stretching a vector on an axis

Standard shear matrix

\( S_k = \begin{bmatrix} 1 & k \\ 0 & 1 \end{bmatrix}\)

Injection Iniezione 単射

A transformation is injective iff every image has one unique mapping that no other image is mapped to, or alternatively:

\(T \text{ is injective } \iff \nexists \textbf{v} \in V - \textbf{0} : T(\textbf{v}) = \textbf{0}\)

Notice that this is the same condition as linear independence of the columns

Proof

By WOC allow \( T(\textbf{u}) = T(\textbf{v})=\textbf{w} \land \textbf{u} \neq \textbf{v}\), therefore \( T( \textbf{u} ) -T( \textbf{v}) = T(\textbf{u}-\textbf{v}) = \textbf{0}\), hence \(\text{ker}(T) \gt 1\), therefore injectivity implies no non-trivial solutions. The converse is proved similarly.

Eigenspace Autospazio 固有空間

For a linear transform \(T(\textbf{x})=A\textbf{x}\) and and an eigenvalue \(\lambda\) \(E_{\lambda} = \{ \textbf{x} : (A - \lambda I) \textbf{x} = \textbf{0} \} = \text{nul}(A-\lambda I) \), then \(\text{span}(E_{\lambda})\) is one of the eigenspaces of \(T\)

Orthogonality

Inner product

Binary function returning a scalar related to the orthogonality of the two linear elements

Norm

Unary function returning the magnitude of an element in a linear space

\( \| \textbf{v} \| \geq 0\)

\( \| \textbf{v} \| = 0 \iff \textbf{v} =\textbf{0} \)

\( \| \textbf{v} + \textbf{u} \| \leq \| \textbf{v} \| + \| \textbf{u} \| \)

\( \| c\textbf{v} \| = |c| \| \textbf{v} \|\)

Euclidean vector space

Finite dimension inner product space on \(\mathbb{R}\)

\( (\mathbb{R}^n, \cdot ) \text{ is a Euclidean vector space } \iff n \lt \infty \land \cdot \text{ is an inner product on } \mathbb{R}^n \)

Scalar product Prodotto scalare 内積

Binary operator on two linear elements

Let \( \cdot \) be the scalar product operator, \({\textbf{x}, \textbf{y}, \textbf{z}} \in V\) be linear elements, and \(c \in \mathbb{R}\), then:

\( \textbf{x} \cdot \textbf{y} = \sum_{i=1}^{n} x_i y_i = \textbf{x} \textbf{y}^{T} \)

Orthogonality Ortogonalità 直交

Property generalizing perpendicularity to two elements in a linear space

\( \textbf{v} \perp \textbf{u} \iff \langle \textbf{v} , \textbf{u} \rangle = 0 \)

Distance Distanza 距離

\( d(\textbf{x},\textbf{y}) = \| \textbf{x} - \textbf{y} \| \)

Pythagorean theorem

\( \|\textbf{x} + \textbf{y} \|^2 = \| \textbf{x} \|^2 + \| \textbf{y} \|^2 \iff \textbf{x} \perp \textbf{y} \)

Orthogonal complement

If \(W\) is a subspace, define the orthogonal complement as the following set

\(W^{\perp} = \{ \textbf{z} : \forall \textbf{w} \in W, \textbf{z} \cdot \textbf{w}= 0 \} \)

and it is said that \(\forall \textbf{z} \in W^{\perp}, \textbf{z} \perp W\), or \(\textbf{z}\) is orthogonal to \(W\)

Theorem

Proof

Consider \(A\textbf{z} = \textbf{0}\), obviously \( \textbf{z} \) is in \( \text{nul} (A) \) and due to the nature of matrix multiplication, each row of \(A\) (denoted \(\textbf{a}_i\)) satisfies \(\textbf{z} \cdot \textbf{a}_i = 0\), proving the theorem.

The second theorem can be proved by substituting \(\text{dim} (W) = \text{dim} ( \text{col} (U)) \) and applying rank theorem and the first theorem listed

Orthogonal set

Set of vectors \(S\) such that all elements are orthogonal to eachother

\(S : \forall \textbf{x},\textbf{y} \in S, \textbf{x} \neq \textbf{y} \iff \textbf{x}\cdot\textbf{y} = 0\)

Theorem

\( S \text{ is an orthogonal set } \implies S \text{ is linearly independent}\)

Orthogonal basis

A basis \(\mathcal{U}\) that is also an orthogonal set. Finding coordinates in an orthogonal basis can be done by finding the scalar projection of the vector onto each basis element

Orthogonal projection

The component of a vector \(\textbf{v} \in \mathbb{R}\) casted onto the subspace \(W\)

\( \mathcal{U} \text{ is an orthogonal basis for } W \implies \text{proj}_{W} (\textbf{v}) = \sum_{\textbf{u} \in \mathcal{U}} \frac{ \textbf{v} \cdot \textbf{u}}{ \textbf{u} \cdot \textbf{u}} \textbf{u} \)

\( \text{proj}_{W} (\textbf{v}) = \textbf{v} \iff \textbf{v} \in W\)

Orthogonal decomposition

All elements of \(\mathbb{R}^{n}\) can be decomposed into the sum of a linear element of a subspace and and an element from that subset's orthogonal complement.

\(\forall \textbf{y} \in \mathbb{R}^{n}, \exists \textbf{w} \in W \leq \mathbb{R}^n, \textbf{z} \in W^{\perp} : \textbf{y} = \textbf{w} + \textbf{z}\)

Note that for some orthogonal basis \(\mathcal{U}\) for \(W\), \( \textbf{w} = \sum^{\text{dim(W)}}_{i=1} \text{proj}_{\text{span} \{ \textbf{b}_i\} } (\textbf{y}) \)

Subspacial distance

The distance from \(\textbf{y} \in \mathbb{R}^n\) to the closest element in subspace \(W\)

Best approximation theorem

For some element \(\textbf{y}\) outside subspace \(W\), the element in \(W\) with the smallest distance is \(\text{proj}_{W} (\textbf{y})\)

\( \forall \textbf{v} \in W\setminus \{\text{proj}_{W} (\textbf{y})\}, \| \textbf{y} - \text{proj}_{W} (\textbf{y}) \| \lt \| \textbf{y} - \textbf{v} \| \)

Orthonormal set

Orthogonal set where all vectors are unit vectors

Theorem

Let \(U\) be a matrix whose columns represent an orthonormal basis for \(W\). Then:

Note that if \(\textbf{y} \in W\), then the projection just returns the \(\textbf{y}\) back.

Proof

When the columns of \(U\) form an orthonormal basis for \(W\), \(\text{proj}_{W} (\textbf{y}) = \sum^{\text{dim}(W)}_{i=1} \frac{\textbf{y} \cdot \textbf{u}_i}{u_i \cdot u_i} \textbf{u}_i \) by definition, but since \(\forall i, u_i \cdot u_i = 1\) (property of orthonormal set), the equation simplifies to \(\text{proj}_{W} (\textbf{y}) = \sum^{\text{dim}(W)}_{i=1} (\textbf{y} \cdot \textbf{u}_i) \textbf{u}_i \)

Matrix multiplication shows that \( (U^{T}\textbf{y})_i = (\textbf{y} \cdot \textbf{u}_i) \) and further matrix multiplication evidently proves the theorem.

Orthogonal matrix

Also called an orthonormal matrix, a matrix such that the columns form an orthonormal set, and hence:

\(U \text{ is orthogonal} \iff U^{-1} = U^{T}\)

The change of coordinates from one orthogonal matrix to another is also an orthogonal matrix. This is because \(P_{\mathcal{U} \to \mathcal{Q}}^{T} P_{\mathcal{U} \to \mathcal{Q}} = (P_{\mathcal{Q}}^{T} P_{\mathcal{U}})^{T} P_{\mathcal{Q}}^{T} P_{\mathcal{U}} = P_{\mathcal{U}}^{T} P_{\mathcal{Q}} P_{\mathcal{Q}}^{T} P_{\mathcal{U}} = I\)

\(U \text{ is orthogonal } \implies |\text{det}(U)| = 1\)

Misnomer

Note that if all columns were merely orthogonal, \(U^{T}U\) would equal some diagonal square matrix, but an orthogonal matrix strictly requires that the diagonal matrix obtained by \(U^{T}U=I\).

Gram-Schmidt Process

Algorithm determining an orthogonal basis \(\mathcal{U}\) for some linear space given a regular basis \(\mathcal{B}\)

\(\textbf{u}_i = \textbf{b}_i - \sum^{i-1}_{j=1} \frac{\textbf{b}_i \cdot \textbf{u}_j }{\textbf{u}_j \cdot \textbf{u}_j} \textbf{u}_j \)

Proof of correctness

Let \(\mathcal{U} = \{ \textbf{b}_1 \} \) where \(\textbf{b}_1 \in \mathcal{B}\). Inductively, an orthogonal decomposition of an element of \(\mathcal{B}\) with respect to the current \(\mathcal{U}\) can be constructed as \(\textbf{b} = \textbf{w} +\textbf{z}\) (where \(\textbf{w} \in \text{span} (\mathcal{U}), \textbf{z} \in \text{span} (\mathcal{U})^{\perp} \)).

Since \(\textbf{z}\) is the necessary component needed to reach the chosen basis element of \(\mathcal{B}\) and is orthogonal to the current \(\mathcal{U}\), adding this to the orthogonal basis maintains its orthogonality and spans this new element.

Diagonalization

Kronecker delta

A special function \(\delta : \mathbb{N} \times \mathbb{N} \to \{0,1\} \) with the following definition

\(\delta_{ij} = \begin{cases} 1 & i=j \\ 0 & i \neq j \end{cases}\)

Symmetry Simmetria 対称

\(A \text{ is symmetric} \iff A=A^{T}\)

Symmetric matrixes have a some nice properties and applications, from quadratic forms and orthogonal diagonalisation (as will be seen below) in addition to undirected graph adjacency matrixes

Diagonalization Diagonalizzazione 対角化

A square matrix \(A : n \times n\) may have the decomposition \( A=PDP^{-1}\)

Algorithm

  1. Find eigenvalues \(\lambda_1,\lambda_2,...\lambda_n\) by solving \( \text{det}(A-\lambda I) = 0\)
  2. Find eigenvectors and use them to form a basis \(\mathcal{B}\)
  3. \(P=P_{\mathcal{B}}, D : d_{ij} = \lambda_{i} \delta_{ij}\) (Note the use of Kronecker delta)

Diagonalization theorem

Let \(A : n \times n\) be a square matrix, \(A \text{ is diagonalizable } \iff A \text{ has } n \text{ linearly independent eigenvectors} \)

Proof

Note that \(A=PDP^{-1} \implies AP=PD \implies AP=DP\) if \(P\) is invertible. Since \(D\) is diagonal (applying element \(d_{ii}\) as a scalar to each \(\textbf{p}_{i}\))), this is now reduced to an eigenproblem. Since \(P\) must be invertible, all columns of \(P\) must be linearly independent, proving this theorem

Orthogonal Diagonalization Diagonalizzazione ortogonale 直交対角化

For a symmetric square matrix \(A : n \times n\) one can make the decomposition \( A=PDP^{T}\)

Algorithm

  1. Find eigenvalues \(\lambda_1,\lambda_2,...\lambda_n\) by solving \( \text{det}(A-\lambda I) = 0\)
  2. Find eigenvectors and use them to form a basis \(\mathcal{B}\)
  3. Transform \(\mathcal{B}\) to an orthonormal basis \(\mathcal{U}\) using Gram-Schmidt and then dividing each vector by its norm
  4. \(P=P_{\mathcal{U}}, D : d_{ij} = \lambda_{i} \delta_{ij}\) (Note the use of Kronecker delta)

Spectrum

Set of a matrix's eigienvalues

\(S_{A} = \{ \lambda : \exists \textbf{x} : A\textbf{x} = \lambda \textbf{x} \}\)

Spectral theorem

Let \(A\) be a square matrix, \(A \text{ is orthogonally diagonalizable } \iff A=A^{T}\)

Proof

Orthogonal diagonisability implies symmetry since \(A^T = (PDP^{T})^{T} = PDP^{T} A\). Symmetry implies orthogonal diagonalisability since by the spectral theorem symmetric matrixes have orthogonal eigenspaces, which is the requirement for diagonalization

Let \(A\) of be a square, symmetric matrix with dimensions \(n \times n\):

Proof of orthogonal eigenspaces of symmetric matrixes

If two eigenvectors of a symmetric \(A\) are from different eigenspaces, then they are orthogonal to eachother since \(\lambda_1 \textbf{x}_1 \cdot \textbf{x}_2 = (\lambda_1 \textbf{x}_1)^{T} \textbf{x}_2 = (A \textbf{x}_1)^{T} \textbf{x}_2 = (\textbf{x}_1)^{T} A^{T} \textbf{x}_2 = ( \textbf{x}_1)^{T} A \textbf{x}_2 = \lambda_2 \textbf{x}_1 \cdot \textbf{x}_2\). Since the eigenvalues are distinct \(\textbf{x}_1 \cdot \textbf{x}_2 = 0\) and the vectors are therefore orthogonal

Spectral decomposition

If \(A\) is orthogonally diagonizable (symmetric), and we let \(P = \begin{bmatrix} \textbf{u}_1 & \textbf{u}_2 & ... & \textbf{u}_n \end{bmatrix}\), then there is the decomposition \(A = \sum^{n}_{i=1} \lambda_i \textbf{u}_i \textbf{u}_i^{T}\)

Singular value factorization

For all matrixes of size \(m \times n\), \(A^{T}A\) is symmetric; this allows for a general decomposition for any matrix. Let \(\lambda_1,\lambda_2,...\lambda_n\) be the eigenvalues of \(A^{T}A\) and \(U\) represent the set of normalised eigenvectors of \(A^{T} A\), then:

\(A= W \Sigma U^{T} \)

Algorithm

  1. Find eigenvalues of \(A^{T}A\) \(\lambda_1,\lambda_2,...\lambda_n\) by solving \( \text{det}(A^{T}A-\lambda I) = 0\)
  2. Form \(U\) from the \(n\) orthogonal eigenvectors of \(A^{T}A\)\)
  3. Form \(\Sigma\) by assigning the square root of each eigenvalue to its diagonal in descending order, starting at the top left index. This matrix should have the same dimensions as \(A\)

    Form \(W\) by assigning its columns as \( \textbf{w}_{i} = \frac{A \textbf{u_i}}{\sigma_{i}}, 1 \leq i \leq \text{rank} (A)\) and using Gram-Schmidt to form the rest of the columns until \(m\) columns are formed

Proof

Recall that \(|A\textbf{u_i}| = \sqrt{\lambda_i} = \sigma_i\), so therefore \(A \textbf{u} = \sigma \textbf{w} \). \(U\) is an orthonormal basis and with \(W\) extended to dimension \(m\), also forms an orthonormal basis. Note that by matrix multiplication and the eigenequation, \(AU=W\Sigma \), so trivially \(A=W\Sigma U^{T}\)

Singular value decomposition (SVD)

\(A = \sum_{i=1}^{\text{rank}(A)} \sigma_i \textbf{w}_i \textbf{u}_{i}^{T}\)

Quadratic form

A function \(Q : \mathbb{R}^n \to \mathbb{R} \) with the following structure

\( Q( \textbf{x} ) = \textbf{x}^{T} A \textbf{x} : A=A^{T}\)

By letting \( Q_{A}( \textbf{x} ) = \textbf{x}^{T} A \textbf{x}\) one can see that \(Q_{A} = \frac{Q_{A} + Q_{A^{T}}}{2}\) due to the symmetric constraint in quadrattic form's definition.

Note that \( Q_{A}( \textbf{x} ) = \sum_{i,j}^{n} k_{ij} x_{i}x_{j} \implies A = \begin{bmatrix} k_{11} & \frac{k_{12}}{2} & \ldots & \frac{k_{1n}}{2} \\ \frac{k_{21}}{2} & k_{22} & \ldots & \frac{k_{2n}}{2} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{k_{n1}}{2} & \frac{k_{n2}}{2} & \ldots & k_{nn} \end{bmatrix} \)

Variants

Principle axes theorem

For any \(Q\), there exists some change of coordinates \(\textbf{x} = P \textbf{y}\) such that the \(A\) in the quadratic form can be expressed as a diagonal matrix \(D\)

Proof

While recalling the symmetry of \(A\), \(\textbf{x}^{T} A \textbf{x} = (P\textbf{y})^{T} A (P\textbf{y}) = \textbf{y}^{T}P^{T} A P\textbf{y} = \textbf{y}^{T}P^{T} (PDP^{-1}) P \textbf{y} = \textbf{y}^{T} D \textbf{y}\)

Algorithm

  1. Diagonalize \(A=PDP^{T}\)
  2. \(\textbf{y} = P^{T}\textbf{x}, Q(\textbf{y})=\textbf{y}^{T} D \textbf{y}\)

Method of least-squares

For a system \(A\textbf{x}=\textbf{b}\) with no solution, the least-squares solution \(\textbf{y}\) is an alternative to a solution which has the least error

\( \textbf{y} \text{ is a least-squares solution of } A\textbf{x}=\textbf{b} \iff \forall \textbf{x} \in \mathbb{R}^n , \|\textbf{b} - A\textbf{y}\| \leq \|\textbf{b} - A\textbf{x}\| \)

This can be solved by applying the best approximation theorem to calculate \( \textbf{y} : A \textbf{y} = \text{proj}_{\text{col}(A)} ( \textbf{b} ) \)

Normal system

Since \( \textbf{b} - \text{proj}_{\text{col}(A)} ( \textbf{b} ) \perp \text{col}(A) \), their dot product is zero, so \(A^{T} [ \textbf{b} - \text{proj}_{\text{col}(A)} (\textbf{b})] = 0 \). Matrix algebra shows that this equals:

\(A^{T}A\textbf{y} = A^{T}\textbf{b}\)

Least-squares error

The error \(\|\textbf{b} - A\textbf{y}\|\) with respect to the norm

QR Factorization

A matrix with linearly independent columns can be factorized into an orthogonal matrix spanning the matrix and an upper triangular invertible matrix with a positive diagonal.

\(\textbf{A}=\textbf{Q}\textbf{R}\)

Definite matrix

\(\textbf{A} \text{ is positive definite } \iff \forall \textbf{u} \in \mathbb{R}^n \setminus \{\textbf{0}\} ( \textbf{u} \cdot \textbf{A}\textbf{u} \gt 0 ) \)

\(\textbf{A} \text{ is positive semidefinite } \iff \forall \textbf{u} \in \mathbb{R}^n ( \textbf{u} \cdot \textbf{A}\textbf{u} \geq 0 ) \)