MLP
This is the other name for a neural network.
Advantages
Good accuracy on even data that is far from linearly separable
Can learn complicated functions or concepts
Disadvantages
Danger of overfitting
Slow to train
Some Notations
Consider a simple 2 layer neural network (the input layer is not counted as a layer):
1 output unit, H hidden nodes (+1 dummy bias unit), d input nodes (+1 bias unit) Fully Connected: every node in a layer is connected to every node in the previous layer.
(H+1) + H(d+1) weights to learn.
is the weight on the edge from input node to hidden node h, is the weight on the edhe from hidden node h to the output node.
(with ) are the activations from the hidden layer, usually sigmoid i.e.
The Error Function for Regression is given by:
i.e. the mean squared error, with W=weights and v=weights and i.e. a _linear activation function _i.e.
The Error Function for Classification is given by:
i.e. the cross entropy loss
where i.e. the sigmoid function
Batch Gradient Descent
We must find W, v that minimize the error.
We have:
(computed using chain rule i.e. )
This technique is called backpropagation.
Last updated