Exploring Dimensionality Reduction Techniques in Multilingual Transformers

Huertas Tato, Javier

, Huertas García, Álvaro

, Martín García, Alejandro

and Camacho Fernández, David

(2022). Exploring Dimensionality Reduction Techniques in Multilingual Transformers. "Cognitive Computation", v. 15 ; pp. 590-612. ISSN 1866-9956. https://doi.org/10.1007/s12559-022-10066-8.

Descripción

Título:	Exploring Dimensionality Reduction Techniques in Multilingual Transformers
Autor/es:	Huertas Tato, Javier https://orcid.org/0000-0003-4127-5505 Huertas García, Álvaro https://orcid.org/0000-0003-2165-0144 Martín García, Alejandro https://orcid.org/0000-0002-0800-7632 Camacho Fernández, David https://orcid.org/0000-0002-5051-3475
Tipo de Documento:	Artículo
Título de Revista/Publicación:	Cognitive Computation
Fecha:	29 Octubre 2022
ISSN:	1866-9956
Volumen:	15
Materias:	Informática
ODS:	09. Industria, innovación e infraestructura
Palabras Clave Informales:	Dimensionality Reduction, Natural Language Processing, Semantic Textual Similarity, Multilingual Transformers, Language models
Escuela:	E.T.S.I. de Sistemas Informáticos (UPM)
Departamento:	Sistemas Informáticos
Licencias Creative Commons:	Reconocimiento

Texto completo

PDF (Portable Document Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (916kB)

Resumen

In scientific literature and industry, semantic and context-aware Natural Language Processing-based solutions have been gaining importance in recent years. The possibilities and performance shown by these models when dealing with complex Human Language Understanding tasks are unquestionable, from conversational agents to the fight against disinformation in social networks. In addition, considerable attention is also being paid to developing multilingual models to tackle the language bottleneck. An increase in size has accompanied the growing need to provide more complex models implementing all these features without being conservative in the number of dimensions required. This paper aims to provide a comprehensive account of the impact of a wide variety of dimensional reduction techniques on the performance of different state-of-the-art multilingual siamese transformers, including unsupervised dimensional reduction techniques such as linear and nonlinear feature extraction, feature selection, and manifold techniques. In order to evaluate the effects of these techniques, we considered the multilingual extended version of Semantic Textual Similarity Benchmark (mSTSb) and two different baseline approaches, one using the embeddings from the pre-trained version of five models and another using their fine-tuned STS version. The results evidence that it is possible to achieve an average reduction of 91.58 % ± 2.59 % in the number of dimensions of embeddings from pre-trained models requiring a fitting time 96.68 % ± 0.68 % faster than the fine-tuning process. Besides, we achieve 54.65 % ± 32.20 % dimensionality reduction in embeddings from fine-tuned models. The results of this study will significantly contribute to the understanding of how different tuning approaches affect performance on semantic-aware tasks and how dimensional reduction techniques deal with the high-dimensional embeddings computed for the STS task and their potential for other highly demanding NLP tasks.

Proyectos asociados

Tipo

Código

Acrónimo

Responsable

Título

Gobierno de España

PID2020-117263GB-100

FightDIS

Sin especificar

Horizonte Europa

PLEC2021-007681

XAI-Disinfodemics

Sin especificar

Comunidad de Madrid

S2018/ TCS-4566

CYNAMON

Sin especificar

Horizonte Europa

2020-EU-IA-0252

IBERIFIER

Sin especificar

Más información

ID de Registro:	88877
Identificador DC:	https://oa.upm.es/88877/
Identificador OAI:	oai:oa.upm.es:88877
URL Portal Científico:	https://portalcientifico.upm.es/es/ipublic/item/9974149
Identificador DOI:	10.1007/s12559-022-10066-8
URL Oficial:	https://link.springer.com/article/10.1007/s12559-0...
Depositado por:	iMarina Portal Científico
Depositado el:	05 May 2025 14:59
Ultima Modificación:	05 May 2025 15:29

Estadísticas

Exportar cita

Editar (sólo personal del Archivo)

En esta página

Menú principal

Buscar

Exploring Dimensionality Reduction Techniques in Multilingual Transformers

Cita

Descripción

Texto completo

Resumen

Proyectos asociados

Más información

Acciones

Metrics

Altmetrics probando

Dimensions

Documentos

El repositorio

Agrupados por ...

Datos Investigación

Financiadores

Especiales

En otros formatos

Redes sociales

Información adicional