Deep Learning Specialization - Coursera
main
main
  • Introduction
  • Neural Networks and Deep Learning
    • Introduction to Deep Learning
    • Logistic Regression as a Neural Network (Neural Network Basics)
    • Shallow Neural Network
    • Deep Neural Network
  • Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization
    • Practical Aspects of Deep Learning
    • Optimization Algorithms
    • Hyperparameter Tuning, Batch Normalization and Programming Frameworks
  • Structuring Machine Learning Projects
    • Introduction to ML Strategy
    • Setting Up Your Goal
    • Comparing to Human-Level Performance
    • Error Analysis
    • Mismatched Training and Dev/Test Set
    • Learning from Multiple Tasks
    • End-to-End Deep Learning
  • Convolutional Neural Networks
    • Foundations of Convolutional Neural Networks
    • Deep Convolutional Models: Case Studies
      • Classic Networks
      • ResNets
      • Inception
    • Advice for Using CNNs
    • Object Detection
      • Object Localization
      • Landmark Detection
      • Sliding Window Detection
      • The YOLO Algorithm
      • Intersection over Union
      • Non-Max Suppression
      • Anchor Boxes
      • Region Proposals
    • Face Recognition
      • One-Shot Learning
      • Siamese Network
      • Face Recognition as Binary Classification
    • Neural Style Transfer
  • Sequence Models
    • Recurrent Neural Networks
      • RNN Structure
      • Types of RNNs
      • Language Modeling
      • Vanishing Gradient Problem in RNNs
      • Gated Recurrent Units (GRUs)
      • Long Short-Term Memory Network (LSTM)
      • Bidirectional RNNs
    • Natural Language Processing & Word Embeddings
      • Introduction to Word Embeddings
      • Learning Word Embeddings: Word2Vec and GloVe
      • Applications using Word Embeddings
      • De-Biasing Word Embeddings
    • Sequence Models & Attention Mechanisms
      • Sequence to Sequence Architectures
        • Basic Models
        • Beam Search
        • Bleu Score
        • Attention Model
      • Speech Recognition
Powered by GitBook
On this page
  • Activation Functions
  • Backpropagation in Neural Networks
  • Random Initialization

Was this helpful?

  1. Neural Networks and Deep Learning

Shallow Neural Network

PreviousLogistic Regression as a Neural Network (Neural Network Basics)NextDeep Neural Network

Last updated 4 years ago

Was this helpful?

As discussed earlier, a neural network has layers of neurons.

Every neural network has an input layer and an output layer, with zero or more hidden layers in between.

The following image depicts a 2-layer neural network. It has an input layer, a hidden layer, and an output layer. Note that the input layer is not counted separately, therefore it is called a 2-layer neural network although it technically has 3 layers.

Every node of each layer is connected to every node of the next layer.

Every edge is associated with a weight and every node is associated with a bias b. At each node, we perform the computation wwTx+bw^T x + bwTx+b followed by an activation function.

The overall computation in a 2-layer neural network is as follows:

W is a column vector of all the weights for that layer. b is the column vector for biases in that layer. a is the column vector that stores a values of the layer and becomes the input for the next layer. The final a value is the required probability that a given input belongs to the positive class label.

The above equations are for a single training example. To perform computation across all training examples, replace lowercase letters z, a, x, w with uppercase letters Z, A, X, W where X is a matrix of all the x vectors column-wise and so on.

Activation Functions

We have been using the sigmoid activation function, but there are other activation functions that may be more effective.

We must continue using the sigmoid activation function for the output layer because we need to output a probability i.e. the value must be restricted between 0 and 1. But for the intermediate hidden layers, we could use other activation functions.

Sigmoid Activation Function (σ\sigmaσ)

σ(z)=11+e−z\sigma(z) = \frac{1}{1+e^{-z}}σ(z)=1+e−z1​

This will always result in a value between 0 and 1.

tanh Activation Function

tanh(z)=ez−e−zez+e−ztanh(z) = \frac{e^z - e^{-z}}{e^z + e^{-z}}tanh(z)=ez+e−zez−e−z​

This will always result in a value between -1 and 1.

ReLU (Rectified Linear Unit) Activation Function

Example code:

Z1 = np.dot(W1, X) + b1
A1 = np.tanh(Z1)
Z2 = np.dot(W2, A1) + b2
A2 = sigmoid(Z2)

Backpropagation in Neural Networks

Backpropagation is a means to learn optimal w and b values by computing derivatives. To perform backpropagation, we go from right to left in the neural network and compute derivatives (gradients).

Random Initialization

In the case of Logistic Regression, we could initialize w and b to 0.

However, for neural networks, w must be initialized randomly. (b can still be initialized as 0).

If we initialize w to 0, each neuron in the hidden layer will perform the same computation. So even after multiple iterations of gradient descent, each neuron in the layer will be computing the same thing as the other neurons.

w = np.random.randn(dimensions) * 0.01

We usually multiply with 0.01 (for shallow neural networks) so as to keep the value of w small, since larger values lead to slower training.

The superscript denotes the layer number.

ReLU(z)=max(0,z)ReLU(z) = max(0, z)ReLU(z)=max(0,z)Using the ReLU Activation Function makes the neural network learn much faster. This is why it is most commonly used.