LLM-driven multimodal video-text fusion for isolated sign language recognition

Esteban Romero, Sergio

, Luna Jiménez, Cristina

, Gil Martín, Manuel

, Fernández Martínez, Fernando

and Andre, Elizabeth (2025). LLM-driven multimodal video-text fusion for isolated sign language recognition. En: "25th ACM International Conference on Intelligent Virtual Agents", 16/09/2025-19/09/2025, Berlín, Alemania. ISBN 979-8-4007-1996-7. p. 9. https://doi.org/10.1145/3742886.3756724.

Descripción

Título:	LLM-driven multimodal video-text fusion for isolated sign language recognition
Autor/es:	Esteban Romero, Sergio https://orcid.org/0009-0008-6336-7877 Luna Jiménez, Cristina https://orcid.org/0000-0001-5369-856X Gil Martín, Manuel https://orcid.org/0000-0002-4285-6224 Fernández Martínez, Fernando https://orcid.org/0000-0003-3877-0089 Andre, Elizabeth
Tipo de Documento:	Ponencia en Congreso o Jornada (Artículo)
Título del Evento:	25th ACM International Conference on Intelligent Virtual Agents
Fechas del Evento:	16/09/2025-19/09/2025
Lugar del Evento:	Berlín, Alemania
Título del Libro:	IVA Adjunct '25: Adjunct Proceedings of the 25th ACM International Conference on Intelligent Virtual Agents
Fecha:	30 Septiembre 2025
ISBN:	979-8-4007-1996-7
Materias:	Informática Telecomunicaciones
ODS:	04. Educación de calidad 10. Reducción de las desigualdades
Palabras Clave Informales:	Computing methodologies; artificial intelligence; human-centered computing; accessibility
Escuela:	E.T.S.I. Telecomunicación (UPM)
Departamento:	Ingeniería Electrónica
Licencias Creative Commons:	Reconocimiento - Sin obra derivada - No comercial

Texto completo

PDF (Portable Document Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (2MB)

Resumen

Sign languages are the primary means of communication for deaf communities, but the development of effective automatic recognition systems remains a significant challenge. In this work, we focus on the task of Isolated Sign Language Recognition (ISLR) using a multimodal approach grounded in a Large Language Model (LLM) architecture. We merge modalities, including visual characteristics into the linguistic representation space of LLMs, and perform ablation studies to evaluate the individual contributions of each visual modality to the recognition performance. Experiments are conducted on the AVASAG100 dataset, where our method achieves a weighted F1-score (W-F1) of 70.36±3.00 and a macro F1-score (MF1) of 62.34±3.18 projecting landmarks extracted from the pose into the LLM’s emebdding-space. These results underscore the value of multimodal integration in ISLR and provide guidelines for future research directions.

Proyectos asociados

Tipo

Código

Acrónimo

Responsable

Título

Horizonte Europa

101071191

ASTOUND

Sin especificar

Improving social competences of virtual agents through artificial consciousness based on the Attention Schema Theory

Gobierno de España

PID2021-126061OB-C43

BEWORD

Sin especificar

Descubriendo el significado y la intención mas allá de la palabra hablada: hacia un entorno inteligente para abordar los documentos multimedia

Gobierno de España

PID2020-118112RB-C22

GOMINOLA

Sin especificar

Agentes conversacionales sensibles a usuario, adaptativos y socio-afectivos basados en microservicios

Gobierno de España

PID2023-150584OB-C21

TRUSTBOOST

Sin especificar

Armonizando Flexibilidad y Conformidad en Sistemas de Inteligencia Artificial Conversacional

Más información

ID de Registro:	91247
Identificador DC:	https://oa.upm.es/91247/
Identificador OAI:	oai:oa.upm.es:91247
URL Portal Científico:	https://portalcientifico.upm.es/es/ipublic/item/10389350
Identificador DOI:	10.1145/3742886.3756724
URL Oficial:	https://dl.acm.org/doi/10.1145/3742886.3756724
Depositado por:	iMarina Portal Científico
Depositado el:	06 Oct 2025 09:02
Ultima Modificación:	06 Oct 2025 09:02

Estadísticas

Exportar cita

Editar (sólo personal del Archivo)

En esta página

Menú principal

Buscar

LLM-driven multimodal video-text fusion for isolated sign language recognition

Cita

Descripción

Texto completo

Resumen

Proyectos asociados

Más información

Acciones

Metrics

Altmetrics probando

Dimensions

Documentos

El repositorio

Agrupados por ...

Datos Investigación

Financiadores

Especiales

En otros formatos

Redes sociales

Información adicional