Triplet loss

Triplet loss is a loss function for machine learning algorithms where a baseline (anchor) input is compared to a positive (truthy) input and a negative (falsy) input. The distance from the baseline (anchor) input to the positive (truthy) input is minimized, and the distance from the baseline (anchor) input to the negative (falsy) input is maximized.[1][2]

It is often used for learning similarity for the purpose of learning embeddings, like word embeddings and even thought vectors, and metric learning.[3].

Consider the task of training a neural network to recognize faces (e.g. for admission to a high security zone). A classifier trained to classify an instance would have to be retrained every time a new person is added to the face database. This can be avoided by posing the problem as a similarity learning problem instead of a classication problem. Here the network is trained (using a contrastive loss) to output a distance which is small if the image belongs to a known person and large if the image belongs to an unknown person. However, if we want to output the closest images to a given image, we would like to learn a ranking and not just a similarity. A triplet loss is used in this case.

The loss function can be described using a Euclidean distance function

where is an anchor input, is a positive input of the same class as , is a negative input of a different class from , is a margin between positive and negative pairs, and is an embedding.

This can then be used in a cost function, that is the sum of all losses, which can then be used for minimization of the posed optimization problem

The indices are for individual input vectors given as a triplet. The triplet is formed by drawing an anchor input, a positive input that describes the same entity as the anchor entity, and a negative input that does not describe the same entity as the anchor entity. These inputs are then run through the network, and the outputs are used in the loss function.

In computer vision a prevailing belief has been that the triplet loss is inferior to using surrogate losses followed by separate metric learning steps. Alexander Hermans, Lucas Beyer, and Bastian Leibe showed that for models trained from scratch, as well as pretrained models, a special version of triplet loss doing end-to-end deep metric learning outperforms most other published methods as of 2017.[4]

See also

References

  1. Chechik, G.; Sharma, V.; Shalit, U.; Bengio, S. (2010). "Large Scale Online Learning of Image Similarity Through Ranking" (PDF). Journal of Machine Learning Research. 11: 1109–1135.
  2. Schroff, F.; Kalenichenko, D.; Philbin, J. (June 2015). FaceNet: A unified embedding for face recognition and clustering. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 815–823. arXiv:1503.03832. doi:10.1109/CVPR.2015.7298682. ISBN 978-1-4673-6964-0.
  3. Ailon, Nir; Hoffer, Elad (2014-12-20). "Deep metric learning using Triplet network". arXiv:1412.6622. Bibcode:2014arXiv1412.6622H. Cite journal requires |journal= (help)
  4. Hermans, Alexander; Beyer, Lucas; Leibe, Bastian (2017-03-22). "In Defense of the Triplet Loss for Person Re-Identification". arXiv:1703.07737 [cs.CV].


This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.