Parameter-efficient adaptation of large vision—language models for video memorability prediction

Martín Fernández, Iván ORCID: https://orcid.org/0009-0004-2769-9752, Esteban Romero, Sergio ORCID: https://orcid.org/0009-0008-6336-7877, Fernández Martínez, Fernando ORCID: https://orcid.org/0000-0003-3877-0089 and Gil Martín, Manuel ORCID: https://orcid.org/0000-0002-4285-6224 (2025). Parameter-efficient adaptation of large vision—language models for video memorability prediction. "Sensors", v. 25 (n. 6); p. 1661. ISSN 1424-8220. https://doi.org/10.3390/s25061661.

Descripción

Título: Parameter-efficient adaptation of large vision—language models for video memorability prediction
Autor/es:
Tipo de Documento: Artículo
Título de Revista/Publicación: Sensors
Fecha: 7 Marzo 2025
ISSN: 1424-8220
Volumen: 25
Número: 6
Materias:
ODS:
Palabras Clave Informales: Large visual language models, video memorability, multimedia perception, efficient adaptation
Escuela: E.T.S.I. Telecomunicación (UPM)
Departamento: Ingeniería Electrónica
Licencias Creative Commons: Reconocimiento

Texto completo

[thumbnail of 10329522.pdf] PDF (Portable Document Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (3MB)

Resumen

The accurate modelling of video memorability, or the intrinsic properties that render a piece of audiovisual content more likely to be remembered, will facilitate the development of automatic systems that are more efficient in retrieving, classifying and generating impactful media. Recent studies have indicated a strong correlation between the visual semantics of video and its memorability. This underscores the importance of developing advanced visual comprehension abilities to enhance model performance. It has been demonstrated that Large Vision–Language Models (LVLMs) demonstrate exceptional proficiency in generalist, high-level semantic comprehension of images and video, due to their extensive multimodal pre-training on a vast scale. This work makes use of the vast generalist knowledge of LVLMs and explores efficient adaptation techniques with a view to utilising them as memorability predictors. In particular, the Quantized Low-Rank Adaptation (QLoRA) technique is employed to fine-tune the Qwen-VL model with memorability-related data extracted from the Memento10k dataset. In light of existing research, we propose a particular methodology that transforms Qwen-VL from a language model to a memorability score regressor. Furthermore, we consider the influence of selecting appropriate LoRA hyperparameters, a design aspect that has been insufficiently studied. We validate the LoRA rank and alpha hyperparameters using 5-Fold Cross-Validation and evaluate our best configuration on the official testing portion of the Memento10k dataset, obtaining a state-of-the-art Spearman Rank Correlation Coefficient (SRCC) of 0.744. Consequently, this work represents a significant advancement in modelling video memorability through high-level semantic understanding.

Proyectos asociados

Tipo
Código
Acrónimo
Responsable
Título
Sin especificar
101071191
ASTOUND
Sin especificar
Improving social competences of virtual agents through artificial consciousness based on the Attention Schema Theory
Sin especificar
PID2020-118112RB-C22
GOMINOLA
Sin especificar
Agentes conversacionales sensibles a usuario, adaptativos y socio-afectivos basados en microservicios
Sin especificar
PID2023-150584OB-C21
TRUSTBOOST
Sin especificar
Armonizando Flexibilidad y Conformidad en Sistemas de Inteligencia Artificial Conversacional
Sin especificar
PID2021-126061OB-C43
BEWORD
Sin especificar
Descubriendo el significado y la intención más allá de la palabra hablada: hacia un entorno inteligente para abordar los documentos multimedia

Más información

ID de Registro: 88245
Identificador DC: https://oa.upm.es/88245/
Identificador OAI: oai:oa.upm.es:88245
URL Portal Científico: https://portalcientifico.upm.es/es/ipublic/item/10329522
Identificador DOI: 10.3390/s25061661
URL Oficial: https://www.mdpi.com/1424-8220/25/6/1661
Depositado por: iMarina Portal Científico
Depositado el: 11 Mar 2025 10:09
Ultima Modificación: 11 Mar 2025 10:09