Deep Learning Specialization - Coursera
1.0.0
1.0.0
  • Introduction
  • Neural Networks and Deep Learning
    • Introduction to Deep Learning
    • Logistic Regression as a Neural Network (Neural Network Basics)
    • Shallow Neural Network
    • Deep Neural Network
  • Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization
    • Practical Aspects of Deep Learning
    • Optimization Algorithms
    • Hyperparameter Tuning, Batch Normalization and Programming Frameworks
  • Structuring Machine Learning Projects
    • Introduction to ML Strategy
    • Setting Up Your Goal
    • Comparing to Human-Level Performance
    • Error Analysis
    • Mismatched Training and Dev/Test Set
    • Learning from Multiple Tasks
    • End-to-End Deep Learning
  • Convolutional Neural Networks
    • Foundations of Convolutional Neural Networks
    • Deep Convolutional Models: Case Studies
      • Classic Networks
      • ResNets
      • Inception
    • Advice for Using CNNs
    • Object Detection
      • Object Localization
      • Landmark Detection
      • Sliding Window Detection
      • The YOLO Algorithm
      • Intersection over Union
      • Non-Max Suppression
      • Anchor Boxes
      • Region Proposals
    • Face Recognition
      • One-Shot Learning
      • Siamese Network
      • Face Recognition as Binary Classification
    • Neural Style Transfer
  • Sequence Models
    • Recurrent Neural Networks
      • RNN Structure
      • Types of RNNs
      • Language Modeling
      • Vanishing Gradient Problem in RNNs
      • Gated Recurrent Units (GRUs)
      • Long Short-Term Memory Network (LSTM)
      • Bidirectional RNNs
    • Natural Language Processing & Word Embeddings
      • Introduction to Word Embeddings
      • Learning Word Embeddings: Word2Vec and GloVe
      • Applications using Word Embeddings
      • De-Biasing Word Embeddings
    • Sequence Models & Attention Mechanisms
      • Sequence to Sequence Architectures
        • Basic Models
        • Beam Search
        • Bleu Score
        • Attention Model
      • Speech Recognition
Powered by GitBook
On this page
  • Some Notations
  • Logistic Regression Equation
  • Logistic Regression Cost Function
  • Gradient Descent

Was this helpful?

  1. Neural Networks and Deep Learning

Logistic Regression as a Neural Network (Neural Network Basics)

Logistic Regression is a classification algorithm. It is commonly used for Binary Classification.

Binary Classification is the categorization of items into one of two categories/classes.

Some Notations

The number of training examples is denoted by m. Training examples are denoted as (x,y) tuples where x is independent (features/attributes) and y is dependent (label).

Say we have a pxq image. It will have pxqx3 pixels (for an RGB image). These pxqx3 values are stored in the form of a single column vector x having pxqx3 rows. This is how a training image is represented.

We then create a matrix X that stores all the m training images as vectors (column-wise). So it will have m columns, each with pxqx3 rows.

Another vector Y is used to store the m training labels, column-wise. So it has dimensions 1xm.

Logistic Regression Equation

We have a vector of features x. Given x, we need to predict the probability y^\hat{y}y^​ that y=1.

We also have parameters w and b.

We denote y^\hat{y}y^​ as follows:

y^=σ(wTx+b)\hat{y} = \sigma(w^T x + b)y^​=σ(wTx+b)

where σ\sigmaσ is the sigmoid activation function that restricts the value between 0 and 1.

Logistic Regression Cost Function

To train the parameters w and b, we need a cost function.

Let us consider the Loss Function L that denotes the error in the prediction for a given example as follows:

L(y^,y)=−(ylogy^+(1−y)log(1−y^))L(\hat{y}, y) = - (y log \hat{y} + (1-y) log (1-\hat{y}))L(y^​,y)=−(ylogy^​+(1−y)log(1−y^​))

Now, we denote the cost function as follows:

J(w,b)=1m∑i=1mL(y^(i),y(i))J(w, b) = \frac{1}{m}\sum_{i=1}^{m}L(\hat{y}^{(i)}, y^{(i)})J(w,b)=m1​∑i=1m​L(y^​(i),y(i))

i=1 to m denote the m training examples.

Our aim now is to find w and b which minimize the cost function J. This is done using Gradient Descent.

Gradient Descent

Gradient Descent helps us find the global minimum of a function, thereby letting us find optimal w and b values that minimize J.

It is denoted as follows:

Repeat {

w:=w−α∂J(w,b)∂ww := w - \alpha\frac{\partial J(w,b)}{\partial w}w:=w−α∂w∂J(w,b)​

b:=b−α∂J(w,b)∂bb := b - \alpha\frac{\partial J(w,b)}{\partial b}b:=b−α∂b∂J(w,b)​

}

where α\alphaα is the learning rate.

PreviousIntroduction to Deep LearningNextShallow Neural Network

Last updated 5 years ago

Was this helpful?