Deep Neural Network

A Deep Neural Network is a neural network that has multiple hidden layers.

Forward Propagation in a Deep Network

Say we have L layers in the deep network. The vectorized implementation of forward propagation is as follows:

for l=1 to L:

$Z^{[l]} = W^{[l]}A^{[l-1]} + B^{[l]}$

$A^{[l]} = g^{[l]}(Z^{[l]})$

where $g^{[l]}$ is the activation function for the layer l. (since different layers can have different activation functions).

The required probability will be $A^{[L]}$ i.e. the last value output by the loop. (Note that the input layer X is denoted as $A^{[0]}$ ).

Getting Your Dimensions Right

One of the main reasons for bugs in our code could be mismatched dimensions. It's a good idea to keep the following dimensions in mind:

Why Deep Representations?

The main reason for using deep neural networks is that shallows networks may not be able to capture/learn complex features. In deep networks, every layer learns more complex features than its previous layer.

Forward and Backward Propagation

Forward and Backward Propagation for deep networks are pretty much the same as they were for shallow networks.

Forward Propagation:

Backward Propagation:

Parameters vs. Hyperparameters

Parameters of a neural network include weights (W's) and biases (b's).

Other factors such as the learning rate, number of hidden layers, number of hidden units, number of iterations, choice of activation function etc. are hyperparameters.

Hyperparameters control the values of the parameters.

PreviousShallow Neural Network NextImproving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization

Last updated 3 years ago

Forward Propagation in a Deep Network

Say we have L layers in the deep network. The vectorized implementation of forward propagation is as follows:

for l=1 to L:

Z^{[l]} = W^{[l]}A^{[l-1]} + B^{[l]}

A^{[l]} = g^{[l]}(Z^{[l]})

where

g^{[l]}

is the activation function for the layer l. (since different layers can have different activation functions).

The required probability will be

A^{[L]}

i.e. the last value output by the loop. (Note that the input layer X is denoted as

A^{[0]}