Deep Learning Specialization - Coursera
main
main
  • Introduction
  • Neural Networks and Deep Learning
    • Introduction to Deep Learning
    • Logistic Regression as a Neural Network (Neural Network Basics)
    • Shallow Neural Network
    • Deep Neural Network
  • Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization
    • Practical Aspects of Deep Learning
    • Optimization Algorithms
    • Hyperparameter Tuning, Batch Normalization and Programming Frameworks
  • Structuring Machine Learning Projects
    • Introduction to ML Strategy
    • Setting Up Your Goal
    • Comparing to Human-Level Performance
    • Error Analysis
    • Mismatched Training and Dev/Test Set
    • Learning from Multiple Tasks
    • End-to-End Deep Learning
  • Convolutional Neural Networks
    • Foundations of Convolutional Neural Networks
    • Deep Convolutional Models: Case Studies
      • Classic Networks
      • ResNets
      • Inception
    • Advice for Using CNNs
    • Object Detection
      • Object Localization
      • Landmark Detection
      • Sliding Window Detection
      • The YOLO Algorithm
      • Intersection over Union
      • Non-Max Suppression
      • Anchor Boxes
      • Region Proposals
    • Face Recognition
      • One-Shot Learning
      • Siamese Network
      • Face Recognition as Binary Classification
    • Neural Style Transfer
  • Sequence Models
    • Recurrent Neural Networks
      • RNN Structure
      • Types of RNNs
      • Language Modeling
      • Vanishing Gradient Problem in RNNs
      • Gated Recurrent Units (GRUs)
      • Long Short-Term Memory Network (LSTM)
      • Bidirectional RNNs
    • Natural Language Processing & Word Embeddings
      • Introduction to Word Embeddings
      • Learning Word Embeddings: Word2Vec and GloVe
      • Applications using Word Embeddings
      • De-Biasing Word Embeddings
    • Sequence Models & Attention Mechanisms
      • Sequence to Sequence Architectures
        • Basic Models
        • Beam Search
        • Bleu Score
        • Attention Model
      • Speech Recognition
Powered by GitBook
On this page

Was this helpful?

  1. Convolutional Neural Networks
  2. Face Recognition

Siamese Network

PreviousOne-Shot LearningNextFace Recognition as Binary Classification

Last updated 4 years ago

Was this helpful?

The Siamese Network was introduced in a paper titled "DeepFace" in 2014. It consists of a pair of identical deep neural networks, each of which takes an image as an input and computes an encoding in one of its layers.

Say the two identical networks took images x(i)x^{(i)}x(i) and x(j)x^{(j)}x(j) as inputs and computed encodings f(x(i))f(x^{(i)})f(x(i)) and f(x(j))f(x^{(j)})f(x(j)) in a certain hidden layer. To compute the difference between the images, we compute d(x(i),x(j)x^{(i)}, x^{(j)}x(i),x(j)) = ||f(x(i))−f(x(j))f(x^{(i)})-f(x^{(j)})f(x(i))−f(x(j))||2^22 and consider the images to be of the same person if this value is below a certain threshold.

Triplet Loss

To learn the parameters for a siamese network so as to generate a good encoding of an input image, we apply gradient descent on the triplet loss function.

For every image (called anchor image A), we consider a positive image P and a negative image N. P will be similar to A and N will not be similar to A (i.e. A and P are pictures of the same person while N is a picture of a different person).

Our aim is to satisfy the following equation:

The triplet loss function is given by:

and the cost function will be:

Note that while training, we must not choose A, P, N randomly. Instead, we must choose A, P, N such that d(A, P) is very close to d(A, N). This will allow the gradient descent to choose parameters that maximize the margin between similar-looking A and N images.

Also, since A and P are images of the same person, we will need multiple images of the same person while training (as opposed to one-shot learning).

d(A,P)+αd(A, P) + \alphad(A,P)+α <= d(A,N)d(A, N)d(A,N)

i.e. d(A,P)−d(A,N)+αd(A, P) - d(A, N) + \alphad(A,P)−d(A,N)+α <= 0

where α\alphaα is called the margin.

L(A,P,N)=max(d(A,P)−d(A,N)+α,0)L(A, P, N) = max(d(A, P) - d(A, N) + \alpha, 0)L(A,P,N)=max(d(A,P)−d(A,N)+α,0)

J=∑i=1mL(A(i),P(i),N(i))J = \sum_{i=1}^m L(A^{(i)}, P^{(i)}, N^{(i)})J=∑i=1m​L(A(i),P(i),N(i))