Synthesizing olfactory understanding: multimodal language models for image-text smell matching

Esteban Romero, Sergio

, Martín Fernández, Iván

, Gil Martín, Manuel

and Fernández Martínez, Fernando

(2025). Synthesizing olfactory understanding: multimodal language models for image-text smell matching. "Symmetry", v. 17 (n. 8); p. 1349. ISSN 2073-8994. https://doi.org/10.3390/sym17081349.

Descripción

Título:	Synthesizing olfactory understanding: multimodal language models for image-text smell matching
Autor/es:	Esteban Romero, Sergio https://orcid.org/0009-0008-6336-7877 Martín Fernández, Iván https://orcid.org/0009-0004-2769-9752 Gil Martín, Manuel https://orcid.org/0000-0002-4285-6224 Fernández Martínez, Fernando https://orcid.org/0000-0003-3877-0089
Tipo de Documento:	Artículo
Título de Revista/Publicación:	Symmetry
Fecha:	18 Agosto 2025
ISSN:	2073-8994
Volumen:	17
Número:	8
Materias:	Informática Psicología
ODS:	09. Industria, innovación e infraestructura
Palabras Clave Informales:	Olfactory understanding; multimodal perception; Contrastive Language–Image Pretraining (CLIP); Multimodal Large Language Models (MM-LLMs)
Escuela:	E.T.S.I. Telecomunicación (UPM)
Departamento:	Ingeniería Electrónica
Licencias Creative Commons:	Reconocimiento - Sin obra derivada - No comercial

Texto completo

PDF (Portable Document Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (3MB)

Resumen

Olfactory information, crucial for human perception, is often underrepresented compared to visual and textual data. This work explores methods for understanding smell descriptions within a multimodal context, where scent information is conveyed indirectly through text and images. We address the challenges of the Multimodal Understanding of Smells in Texts and Images (MUSTI) task by proposing novel approaches that leverage language-specific models and state-of-the-art multimodal large language models (MM-LLMs). Our core contribution is a multimodal framework using language-specific encoders for text and image data. This allows for a joint embedding space that explores the semantic symmetry between smells, texts, and images to identify olfactory-related connections shared across the modalities. While ensemble learning with language-specific models achieved good performance, MM-LLMs demonstrated exceptional potential. Fine-tuning a quantized version of the Qwen-VL-Chat model achieved a state-of-the-art macro F1-score of 0.7618 on the MUSTI task. This highlights the effectiveness of MM-LLMs in capturing task requirements and adapting to specific formats.

Proyectos asociados

Tipo

Código

Acrónimo

Responsable

Título

Horizonte Europa

101071191

ASTOUND

Sin especificar

Improving social competences of virtual agents through artificial consciousness based on the Attention Schema Theory

Gobierno de España

PID2023-150584OB-C21

TRUSTBOOST

Sin especificar

Armonizando Flexibilidad y Conformidad en Sistemas de Inteligencia Artificial Conversacional

Gobierno de España

PID2020-118112RB-C22

GOMINOLA

Sin especificar

Agentes conversacionales sensibles a usuario, adaptativos y socio-afectivos basados en microservicios

Gobierno de España

PID2021-126061OB-C43

BEWORD

Sin especificar

Descubriendo el significado y la intención más allá de la palabra hablada: hacia un entorno inteligente para abordar los documentos multimedia

Más información

ID de Registro:	90955
Identificador DC:	https://oa.upm.es/90955/
Identificador OAI:	oai:oa.upm.es:90955
URL Portal Científico:	https://portalcientifico.upm.es/es/ipublic/item/10384303
Identificador DOI:	10.3390/sym17081349
URL Oficial:	https://www.mdpi.com/2073-8994/17/8/1349
Depositado por:	iMarina Portal Científico
Depositado el:	02 Oct 2025 09:23
Ultima Modificación:	09 Abr 2026 14:42

Estadísticas

Exportar cita

Editar (sólo personal del Archivo)

En esta página

Menú principal

Buscar

Synthesizing olfactory understanding: multimodal language models for image-text smell matching

Cita

Descripción

Texto completo

Resumen

Proyectos asociados

Más información

Acciones

Metrics

Altmetrics probando

Dimensions

Documentos

El repositorio

Agrupados por ...

Datos Investigación

Financiadores

Especiales

En otros formatos

Redes sociales

Información adicional