Siamese Network

The Siamese Network was introduced in a paper titled "DeepFace" in 2014. It consists of a pair of identical deep neural networks, each of which takes an image as an input and computes an encoding in one of its layers.

Say the two identical networks took images $x^{(i)}$ and $x^{(j)}$ as inputs and computed encodings $f(x^{(i)})$ and $f(x^{(j)})$ in a certain hidden layer. To compute the difference between the images, we compute d( $x^{(i)}, x^{(j)}$ ) = || $f(x^{(i)})-f(x^{(j)})$ || $^2$ and consider the images to be of the same person if this value is below a certain threshold.

Triplet Loss

To learn the parameters for a siamese network so as to generate a good encoding of an input image, we apply gradient descent on the triplet loss function.

For every image (called anchor image A), we consider a positive image P and a negative image N. P will be similar to A and N will not be similar to A (i.e. A and P are pictures of the same person while N is a picture of a different person).

Our aim is to satisfy the following equation:

$d(A, P) + \alpha$ <= $d(A, N)$

i.e. $d(A, P) - d(A, N) + \alpha$ <= 0

where $\alpha$ is called the margin.

The triplet loss function is given by:

$L(A, P, N) = max(d(A, P) - d(A, N) + \alpha, 0)$

and the cost function will be:

$J = \sum_{i=1}^m L(A^{(i)}, P^{(i)}, N^{(i)})$

Note that while training, we must not choose A, P, N randomly. Instead, we must choose A, P, N such that d(A, P) is very close to d(A, N). This will allow the gradient descent to choose parameters that maximize the margin between similar-looking A and N images.

Also, since A and P are images of the same person, we will need multiple images of the same person while training (as opposed to one-shot learning).

PreviousOne-Shot Learning NextFace Recognition as Binary Classification

Last updated 4 years ago

Triplet Loss

To learn the parameters for a siamese network so as to generate a good encoding of an input image, we apply gradient descent on the triplet loss function.

Our aim is to satisfy the following equation:

d(A, P) + \alpha

d(A, N)

i.e.

d(A, P) - d(A, N) + \alpha

<= 0

where

\alpha

is called the margin.

The triplet loss function is given by:

L(A, P, N) = max(d(A, P) - d(A, N) + \alpha, 0)

and the cost function will be:

J = \sum_{i=1}^m L(A^{(i)}, P^{(i)}, N^{(i)})

Also, since A and P are images of the same person, we will need multiple images of the same person while training (as opposed to one-shot learning).