Parameter-efficient adaptation of large vision—language models for video memorability prediction

Martín Fernández, Iván

, Esteban Romero, Sergio

, Fernández Martínez, Fernando

and Gil Martín, Manuel

(2025). Parameter-efficient adaptation of large vision—language models for video memorability prediction. "Sensors", v. 25 (n. 6); p. 1661. ISSN 1424-8220. https://doi.org/10.3390/s25061661.

Descripción

Título:	Parameter-efficient adaptation of large vision—language models for video memorability prediction
Autor/es:	Martín Fernández, Iván https://orcid.org/0009-0004-2769-9752 Esteban Romero, Sergio https://orcid.org/0009-0008-6336-7877 Fernández Martínez, Fernando https://orcid.org/0000-0003-3877-0089 Gil Martín, Manuel https://orcid.org/0000-0002-4285-6224
Tipo de Documento:	Artículo
Título de Revista/Publicación:	Sensors
Fecha:	7 Marzo 2025
ISSN:	1424-8220
Volumen:	25
Número:	6
Materias:	Telecomunicaciones
ODS:	09. Industria, innovación e infraestructura
Palabras Clave Informales:	Large visual language models, video memorability, multimedia perception, efficient adaptation
Escuela:	E.T.S.I. Telecomunicación (UPM)
Departamento:	Ingeniería Electrónica
Licencias Creative Commons:	Reconocimiento

Texto completo

PDF (Portable Document Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (3MB)

Resumen

The accurate modelling of video memorability, or the intrinsic properties that render a piece of audiovisual content more likely to be remembered, will facilitate the development of automatic systems that are more efficient in retrieving, classifying and generating impactful media. Recent studies have indicated a strong correlation between the visual semantics of video and its memorability. This underscores the importance of developing advanced visual comprehension abilities to enhance model performance. It has been demonstrated that Large Vision–Language Models (LVLMs) demonstrate exceptional proficiency in generalist, high-level semantic comprehension of images and video, due to their extensive multimodal pre-training on a vast scale. This work makes use of the vast generalist knowledge of LVLMs and explores efficient adaptation techniques with a view to utilising them as memorability predictors. In particular, the Quantized Low-Rank Adaptation (QLoRA) technique is employed to fine-tune the Qwen-VL model with memorability-related data extracted from the Memento10k dataset. In light of existing research, we propose a particular methodology that transforms Qwen-VL from a language model to a memorability score regressor. Furthermore, we consider the influence of selecting appropriate LoRA hyperparameters, a design aspect that has been insufficiently studied. We validate the LoRA rank and alpha hyperparameters using 5-Fold Cross-Validation and evaluate our best configuration on the official testing portion of the Memento10k dataset, obtaining a state-of-the-art Spearman Rank Correlation Coefficient (SRCC) of 0.744. Consequently, this work represents a significant advancement in modelling video memorability through high-level semantic understanding.

Proyectos asociados

Tipo

Código

Acrónimo

Responsable

Título

Sin especificar

101071191

ASTOUND

Sin especificar

Improving social competences of virtual agents through artificial consciousness based on the Attention Schema Theory

Sin especificar

PID2020-118112RB-C22

GOMINOLA

Sin especificar

Agentes conversacionales sensibles a usuario, adaptativos y socio-afectivos basados en microservicios

Sin especificar

PID2023-150584OB-C21

TRUSTBOOST

Sin especificar

Armonizando Flexibilidad y Conformidad en Sistemas de Inteligencia Artificial Conversacional

Sin especificar

PID2021-126061OB-C43

BEWORD

Sin especificar

Descubriendo el significado y la intención más allá de la palabra hablada: hacia un entorno inteligente para abordar los documentos multimedia

Más información

ID de Registro:	88245
Identificador DC:	https://oa.upm.es/88245/
Identificador OAI:	oai:oa.upm.es:88245
URL Portal Científico:	https://portalcientifico.upm.es/es/ipublic/item/10329522
Identificador DOI:	10.3390/s25061661
URL Oficial:	https://www.mdpi.com/1424-8220/25/6/1661
Depositado por:	iMarina Portal Científico
Depositado el:	11 Mar 2025 10:09
Ultima Modificación:	11 Mar 2025 10:09

Estadísticas

Exportar cita

Editar (sólo personal del Archivo)

En esta página

Menú principal

Buscar

Parameter-efficient adaptation of large vision—language models for video memorability prediction

Cita

Descripción

Texto completo

Resumen

Proyectos asociados

Más información

Acciones

Metrics

Altmetrics probando

Dimensions

Documentos

El repositorio

Agrupados por ...

Datos Investigación

Financiadores

Especiales

En otros formatos

Redes sociales

Información adicional