MLP
This is the other name for a neural network.
Advantages
Good accuracy on even data that is far from linearly separable
Can learn complicated functions or concepts
Disadvantages
Danger of overfitting
Slow to train
Some Notations
Consider a simple 2 layer neural network (the input layer is not counted as a layer):
1 output unit, H hidden nodes (+1 dummy bias unit), d input nodes (+1 bias unit) Fully Connected: every node in a layer is connected to every node in the previous layer.
(H+1) + H(d+1) weights to learn.
whjis the weight on the edge from input node xj to hidden node h, vhis the weight on the edhe from hidden node h to the output node.
Z0,Z1,...,ZH (with Z0=1) are the activations from the hidden layer, usually sigmoid i.e. Zh=1+e−wTx1
The Error Function for Regression is given by:
E(W,v)=21∑t(rt−yt)2 i.e. the mean squared error, with W=weights whj and v=weights vh and y=vTZ i.e. a _linear activation function _i.e. y=v0.1+v1Z1+...+vHZH
The Error Function for Classification is given by:
E(W,v)=−∑trtlogyt+(1−rt)log(1−yt) i.e. the cross entropy loss
where y=1+e−vTZ1 i.e. the sigmoid function
Batch Gradient Descent
We must find W, v that minimize the error.
We have:
∂E/∂vh=∑t(rt−yt)Zht
∂E/∂whj=∑t−(rt−yt)vhzht(1−zht)xjt
(computed using chain rule i.e. ∂E/∂whj=∑t∂E/∂y∗∂y/∂zht∗∂zht/∂whj)
This technique is called backpropagation.
Last updated