Siamese Network
Last updated
Last updated
The Siamese Network was introduced in a paper titled "DeepFace" in 2014. It consists of a pair of identical deep neural networks, each of which takes an image as an input and computes an encoding in one of its layers.
Say the two identical networks took images and as inputs and computed encodings and in a certain hidden layer. To compute the difference between the images, we compute d() = |||| and consider the images to be of the same person if this value is below a certain threshold.
To learn the parameters for a siamese network so as to generate a good encoding of an input image, we apply gradient descent on the triplet loss function.
For every image (called anchor image A), we consider a positive image P and a negative image N. P will be similar to A and N will not be similar to A (i.e. A and P are pictures of the same person while N is a picture of a different person).
Our aim is to satisfy the following equation:
The triplet loss function is given by:
and the cost function will be:
Note that while training, we must not choose A, P, N randomly. Instead, we must choose A, P, N such that d(A, P) is very close to d(A, N). This will allow the gradient descent to choose parameters that maximize the margin between similar-looking A and N images.
Also, since A and P are images of the same person, we will need multiple images of the same person while training (as opposed to one-shot learning).
<=
i.e. <= 0
where is called the margin.