Topic-oriented text features can match visual deep models of video memorability

Kleinlein, Ricardo ORCID: https://orcid.org/0000-0002-7313-7601, Luna Jiménez, Cristina ORCID: https://orcid.org/0000-0001-5369-856X, Arias Cuadrado, David, Ferreiros López, Javier ORCID: https://orcid.org/0000-0001-8834-3080 and Fernández Martínez, Fernando ORCID: https://orcid.org/0000-0003-3877-0089 (2021). Topic-oriented text features can match visual deep models of video memorability. "Applied Sciences", v. 11 (n. 16); p. 7406. ISSN 2076-3417. https://doi.org/10.3390/app11167406.

Descripción

Título: Topic-oriented text features can match visual deep models of video memorability
Autor/es:
Tipo de Documento: Artículo
Título de Revista/Publicación: Applied Sciences
Fecha: 12 Agosto 2021
ISSN: 2076-3417
Volumen: 11
Número: 16
Materias:
Palabras Clave Informales: BERT; sentence-BERT; transformer; topic detection; video memorability; linear regression; DenseNet-121; PCA
Escuela: E.T.S.I. Telecomunicación (UPM)
Departamento: Ingeniería Electrónica
Licencias Creative Commons: Reconocimiento

Texto completo

[thumbnail of 9343044.pdf] PDF (Portable Document Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (8MB)

Resumen

Not every visual media production is equally retained in memory. Recent studies have shown that the elements of an image, as well as their mutual semantic dependencies, provide a strong clue as to whether a video clip will be recalled on a second viewing or not. We believe that short textual descriptions encapsulate most of these relationships among the elements of a video, and thus they represent a rich yet concise source of information to tackle the problem of media memorability prediction. In this paper, we deepen the study of short captions as a means to convey in natural language the visual semantics of a video. We propose to use vector embeddings from a pretrained SBERT topic detection model with no adaptation as input features to a linear regression model, showing that, from such a representation, simpler algorithms can outperform deep visual models. Our results suggest that text descriptions expressed in natural language might be effective in embodying the visual semantics required to model video memorability.

Proyectos asociados

Tipo
Código
Acrónimo
Responsable
Título
Gobierno de España
TEC2017-84593-C2-1-R
Sin especificar
Juan Manuel Montero Martínez
Inferencia de la respuesta afectiva de los espectadores de un vídeo
Gobierno de España
TIN2017-85854-C4-4-R
Sin especificar
Javier Ferreiros López
Análisis afectivo de información multimedia con comunicación inclusiva natural
Gobierno de España
PRE2018-083225
Sin especificar
Sin especificar
Sin especificar

Más información

ID de Registro: 87027
Identificador DC: https://oa.upm.es/87027/
Identificador OAI: oai:oa.upm.es:87027
URL Portal Científico: https://portalcientifico.upm.es/es/ipublic/item/9343044
Identificador DOI: 10.3390/app11167406
URL Oficial: https://www.mdpi.com/2076-3417/11/16/7406
Depositado por: iMarina Portal Científico
Depositado el: 29 Ene 2025 13:12
Ultima Modificación: 29 Ene 2025 13:12