Texto completo
|
PDF (Portable Document Format)
- Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (2MB) |
ORCID: https://orcid.org/0009-0008-6336-7877, Luna Jiménez, Cristina
ORCID: https://orcid.org/0000-0001-5369-856X, Gil Martín, Manuel
ORCID: https://orcid.org/0000-0002-4285-6224, Fernández Martínez, Fernando
ORCID: https://orcid.org/0000-0003-3877-0089 and Andre, Elizabeth
(2025).
LLM-driven multimodal video-text fusion for isolated sign language recognition.
En: "25th ACM International Conference on Intelligent Virtual Agents", 16/09/2025-19/09/2025, Berlín, Alemania. ISBN 979-8-4007-1996-7. p. 9.
https://doi.org/10.1145/3742886.3756724.
| Título: | LLM-driven multimodal video-text fusion for isolated sign language recognition |
|---|---|
| Autor/es: |
|
| Tipo de Documento: | Ponencia en Congreso o Jornada (Artículo) |
| Título del Evento: | 25th ACM International Conference on Intelligent Virtual Agents |
| Fechas del Evento: | 16/09/2025-19/09/2025 |
| Lugar del Evento: | Berlín, Alemania |
| Título del Libro: | IVA Adjunct '25: Adjunct Proceedings of the 25th ACM International Conference on Intelligent Virtual Agents |
| Fecha: | 30 Septiembre 2025 |
| ISBN: | 979-8-4007-1996-7 |
| Materias: | |
| ODS: | |
| Palabras Clave Informales: | Computing methodologies; artificial intelligence; human-centered computing; accessibility |
| Escuela: | E.T.S.I. Telecomunicación (UPM) |
| Departamento: | Ingeniería Electrónica |
| Licencias Creative Commons: | Reconocimiento - Sin obra derivada - No comercial |
|
PDF (Portable Document Format)
- Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (2MB) |
Sign languages are the primary means of communication for deaf communities, but the development of effective automatic recognition systems remains a significant challenge. In this work, we focus on the task of Isolated Sign Language Recognition (ISLR) using a multimodal approach grounded in a Large Language Model (LLM) architecture. We merge modalities, including visual characteristics into the linguistic representation space of LLMs, and perform ablation studies to evaluate the individual contributions of each visual modality to the recognition performance. Experiments are conducted on the AVASAG100 dataset, where our method achieves a weighted F1-score (W-F1) of 70.36±3.00 and a macro F1-score (MF1) of 62.34±3.18 projecting landmarks extracted from the pose into the LLM’s emebdding-space. These results underscore the value of multimodal integration in ISLR and provide guidelines for future research directions.
| ID de Registro: | 91247 |
|---|---|
| Identificador DC: | https://oa.upm.es/91247/ |
| Identificador OAI: | oai:oa.upm.es:91247 |
| URL Portal Científico: | https://portalcientifico.upm.es/es/ipublic/item/10389350 |
| Identificador DOI: | 10.1145/3742886.3756724 |
| URL Oficial: | https://dl.acm.org/doi/10.1145/3742886.3756724 |
| Depositado por: | iMarina Portal Científico |
| Depositado el: | 06 Oct 2025 09:02 |
| Ultima Modificación: | 06 Oct 2025 09:02 |
Publicar en el Archivo Digital desde el Portal Científico