Comprehensive comparison between vision transformers and convolutional neural networks for face recognition tasks

Rodrigo Talavera, Marcos de ORCID: https://orcid.org/0000-0002-1808-4738, Cuevas Rodríguez, Carlos ORCID: https://orcid.org/0000-0001-9873-8502 and García Santos, Narciso ORCID: https://orcid.org/0000-0002-0397-894X (2024). Comprehensive comparison between vision transformers and convolutional neural networks for face recognition tasks. "Scientific Reports", v. 14 ; ISSN 2045-2322. https://doi.org/10.1038/s41598-024-72254-w.

Descripción

Título: Comprehensive comparison between vision transformers and convolutional neural networks for face recognition tasks
Autor/es:
Tipo de Documento: Artículo
Título de Revista/Publicación: Scientific Reports
Fecha: 2024
ISSN: 2045-2322
Volumen: 14
Materias:
Escuela: E.T.S.I. Telecomunicación (UPM)
Departamento: Señales, Sistemas y Radiocomunicaciones
Licencias Creative Commons: Reconocimiento - No comercial - Compartir igual

Texto completo

[thumbnail of 7-Comparison_ViT_CNN.pdf] PDF (Portable Document Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (2MB)

Resumen

This paper presents a comprehensive comparison between Vision Transformers and Convolutional Neural Networks for face recognition related tasks, including extensive experiments on the tasks of face identification and verification. Our study focuses on six state-of-the-art models: EfficientNet, Inception, MobileNet, ResNet, VGG, and Vision Transformers. Our evaluation of these models is based on five diverse datasets: Labeled Faces in the Wild, Real World Occluded Faces, Surveillance Cameras Face, UPM-GTI-Face, and VGG Face 2. These datasets present unique challenges regarding people diversity, distance from the camera, and face occlusions such as those produced by masks and glasses. Our contribution to the field includes a deep analysis of the experimental results, including a thorough examination of the training and evaluation process, as well as the software and hardware configurations used. Our results show that Vision Transformers outperform Convolutional Neural Networks in terms of accuracy and robustness against distance and occlusions for face recognition related tasks, while also presenting a smaller memory footprint and an impressive inference speed, rivaling even the fastest Convolutional Neural Networks. In conclusion, our study provides valuable insights into the performance of Vision Transformers for face recognition related tasks and highlights the potential of these models as a more efficient solution than Convolutional Neural Networks.

Proyectos asociados

Tipo
Código
Acrónimo
Responsable
Título
Sin especificar
PID2020-115132RB
SARAOS
Sin especificar
Sin especificar

Más información

ID de Registro: 85303
Identificador DC: https://oa.upm.es/85303/
Identificador OAI: oai:oa.upm.es:85303
URL Portal Científico: https://portalcientifico.upm.es/es/ipublic/item/10250855
Identificador DOI: 10.1038/s41598-024-72254-w
URL Oficial: https://www.nature.com/articles/s41598-024-72254-w
Depositado por: Dr. Carlos Cuevas Rodríguez
Depositado el: 12 Dic 2024 09:43
Ultima Modificación: 12 Dic 2024 09:43