Texto completo
|
PDF (Portable Document Format)
- Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (3MB) |
ORCID: https://orcid.org/0009-0008-6336-7877, Martín Fernández, Iván
ORCID: https://orcid.org/0009-0004-2769-9752, Gil Martín, Manuel
ORCID: https://orcid.org/0000-0002-4285-6224 and Fernández Martínez, Fernando
ORCID: https://orcid.org/0000-0003-3877-0089
(2025).
Synthesizing olfactory understanding: multimodal language models for image-text smell matching.
"Symmetry", v. 17
(n. 8);
p. 1349.
ISSN 2073-8994.
https://doi.org/10.3390/sym17081349.
| Título: | Synthesizing olfactory understanding: multimodal language models for image-text smell matching |
|---|---|
| Autor/es: |
|
| Tipo de Documento: | Artículo |
| Título de Revista/Publicación: | Symmetry |
| Fecha: | 18 Agosto 2025 |
| ISSN: | 2073-8994 |
| Volumen: | 17 |
| Número: | 8 |
| Materias: | |
| ODS: | |
| Palabras Clave Informales: | Olfactory understanding; multimodal perception; Contrastive Language–Image Pretraining (CLIP); Multimodal Large Language Models (MM-LLMs) |
| Escuela: | E.T.S.I. Telecomunicación (UPM) |
| Departamento: | Ingeniería Electrónica |
| Licencias Creative Commons: | Reconocimiento - Sin obra derivada - No comercial |
|
PDF (Portable Document Format)
- Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (3MB) |
Olfactory information, crucial for human perception, is often underrepresented compared to visual and textual data. This work explores methods for understanding smell descriptions within a multimodal context, where scent information is conveyed indirectly through text and images. We address the challenges of the Multimodal Understanding of Smells in Texts and Images (MUSTI) task by proposing novel approaches that leverage language-specific models and state-of-the-art multimodal large language models (MM-LLMs). Our core contribution is a multimodal framework using language-specific encoders for text and image data. This allows for a joint embedding space that explores the semantic symmetry between smells, texts, and images to identify olfactory-related connections shared across the modalities. While ensemble learning with language-specific models achieved good performance, MM-LLMs demonstrated exceptional potential. Fine-tuning a quantized version of the Qwen-VL-Chat model achieved a state-of-the-art macro F1-score of 0.7618 on the MUSTI task. This highlights the effectiveness of MM-LLMs in capturing task requirements and adapting to specific formats.
| ID de Registro: | 90955 |
|---|---|
| Identificador DC: | https://oa.upm.es/90955/ |
| Identificador OAI: | oai:oa.upm.es:90955 |
| URL Portal Científico: | https://portalcientifico.upm.es/es/ipublic/item/10384303 |
| Identificador DOI: | 10.3390/sym17081349 |
| URL Oficial: | https://www.mdpi.com/2073-8994/17/8/1349 |
| Depositado por: | iMarina Portal Científico |
| Depositado el: | 02 Oct 2025 09:23 |
| Ultima Modificación: | 09 Abr 2026 14:42 |
Publicar en el Archivo Digital desde el Portal Científico