Class of theorems asserting that a specific type of ANNs are dense in a specified function space
More informally, for any function of a specified function space, there exists an ANN that can predict it within an arbitrary degree of accuracy
\forall f \in X, \exists \{ \phi_n \} : \lim_{n \to \infty} \phi_n = f
\forall f \in C, \exists \{ \phi_n \} : \lim_{n \to \infty} \phi_n = f
Numerical value assigned to a node to denote strength and direction of a node
Function that determines the error between a predicted value and true value, and employs some weighting scheme to punish higher errors. The output of this function is known as the cost, and this function serves as a way of quantifying the quality of a NN's estimate.
\(\lambda (x) = C (t-x)^2 \)
Function that takes the input and weights of an ANN and determines some output. This introduces nonlinearity to a NN, making it more powerful than linear regression techniques.
Analogous to a half-wave rectifier in electronics, scalar function that returns the value iff it is positive
\(r(x) = \max (0,x)\)
A function appied to each \(n \times n\) square on some image, or generaly, some portioon of data of size \(n\) (let's call this a pool)
Returning the maximum element in the pool and replacing the pool with this element
Supervised models used for classifying data using regression analysis to partition different types of data using a hyperplane
Different to the definition in Linear Algebra, denotes a function that is a weighted sum or intergal
Weights that differ between runtimes
Weights that gradually become frozen (converge to a value) between runtimes
Vector function that returns the exponential probabilities of each element in a vector
\( \sigma (\vec{z})_{i} = \frac{e^{z_{i}}}{ \sum_{j=1}^{|\vec{z}|} e^{z_{j}} \)
Training a model with a certain dataset to the extent where it can perfectly predict the dataset, but ineffective with other datastes
Matrix \(C : c_{ij}\) where \(c_{ij}\) is the amount of times an object with label \(i\) was classified as \(j\) by a machine learning algorithm. A diagonal matrix is therefore the ideal.
Estimating the gradient of a cost function with respect to a NN's weights by employing the chain rule