BERTuit: Understanding Spanish language in Twitter with transformers

Huertas Tato, Javier

, Martín García, Alejandro

and Camacho Fernández, David

(2023). BERTuit: Understanding Spanish language in Twitter with transformers. "Expert Systems", v. 40 (n. 9); ISSN 1468-0394. https://doi.org/10.1111/exsy.13404.

Descripción

Título:	BERTuit: Understanding Spanish language in Twitter with transformers
Autor/es:	Huertas Tato, Javier https://orcid.org/0000-0003-4127-5505 Martín García, Alejandro https://orcid.org/0000-0002-0800-7632 Camacho Fernández, David https://orcid.org/0000-0002-5051-3475
Tipo de Documento:	Artículo
Título de Revista/Publicación:	Expert Systems
Fecha:	Noviembre 2023
ISSN:	1468-0394
Volumen:	40
Número:	9
Materias:	Informática
ODS:	04. Educación de calidad 09. Industria, innovación e infraestructura 16. Paz, justicia e instituciones sólidas
Palabras Clave Informales:	misinformation, online social networks, transformers, Twitter
Escuela:	E.T.S.I. de Sistemas Informáticos (UPM)
Departamento:	Sistemas Informáticos
Licencias Creative Commons:	Reconocimiento - Sin obra derivada - No comercial

Texto completo

[thumbnail of Expert Systems - 2023 - Huertas‐Tato - BERTuit Understanding Spanish language in Twitter with transformers.pdf]

PDF (Portable Document Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (2MB)

Resumen

The appearance of complex attention-based language models such as BERT, RoBERTa or GPT-3 has allowed to address highly complex tasks in a plethora of scenarios. However, when applied to specific domains, these models encounter considerable difficulties. This is the case of Social Networks such as Twitter, an ever-changing stream of information written with informal and complex language, where each message requires careful evaluation to be understood even by humans given the important role that context plays. Addressing tasks in this domain through Natural Language Processing involves severe challenges. When powerful state-of-the-art multilingual language models are applied to this scenario, language specific nuances get lost in translation. To face these challenges we present BERTuit, the largest transformer proposed so far for Spanish language, pre-trained on a massive dataset of 230 M Spanish tweets using RoBERTa optimization. Our motivation is to provide a powerful resource to better understand Spanish Twitter and to be used on applications focused on this social network, with special emphasis on solutions devoted to tackle the spreading of misinformation in this platform. BERTuit is evaluated on several tasks and compared against M-BERT, XLM-RoBERTa and XLM-T, very competitive multilingual transformers. The utility of our approach is shown with applications, in this case: an unsupervised methodology to visualize groups of hoaxes; and supervised profiling of authors spreading disinformation.

Proyectos asociados

Tipo

Código

Acrónimo

Responsable

Título

Gobierno de España

PID2020-117263GB-100

FightDIS

Sin especificar

Fighting against Information DISorders in Online Social Networks

Comunidad de Madrid

S2018/TCS-4566

CYNAMON-CM

Sin especificar

Cybersecurity, Network Analysis and Monitoring for the Next Generation Internet

Horizonte Europa

2020-EU-IA-0252:29374659

IBERIFIER

Sin especificar

Iberian Digital Media Research and Fact-Checking Hub

Gobierno de España

PLEC2021-007681

XAI-Disinfodemics

Sin especificar

eXplainable AI for disinformation and conspiracy detection during infodemics

Más información

ID de Registro:	88862
Identificador DC:	https://oa.upm.es/88862/
Identificador OAI:	oai:oa.upm.es:88862
URL Portal Científico:	https://portalcientifico.upm.es/es/ipublic/item/10090880
Identificador DOI:	10.1111/exsy.13404
URL Oficial:	https://onlinelibrary.wiley.com/doi/10.1111/exsy.1...
Depositado por:	iMarina Portal Científico
Depositado el:	30 Abr 2025 17:40
Ultima Modificación:	30 Abr 2025 17:40

Estadísticas

Exportar cita

Editar (sólo personal del Archivo)

En esta página

Menú principal

Buscar

BERTuit: Understanding Spanish language in Twitter with transformers

Cita

Descripción

Texto completo

Resumen

Proyectos asociados

Más información

Acciones

Metrics

Altmetrics probando

Dimensions

Documentos

El repositorio

Agrupados por ...

Datos Investigación

Financiadores

Especiales

En otros formatos

Redes sociales

Información adicional