Topic-oriented text features can match visual deep models of video memorability

Kleinlein, Ricardo

, Luna Jiménez, Cristina

, Arias Cuadrado, David, Ferreiros López, Javier

and Fernández Martínez, Fernando

(2021). Topic-oriented text features can match visual deep models of video memorability. "Applied Sciences", v. 11 (n. 16); p. 7406. ISSN 2076-3417. https://doi.org/10.3390/app11167406.

Descripción

Título:	Topic-oriented text features can match visual deep models of video memorability
Autor/es:	Kleinlein, Ricardo https://orcid.org/0000-0002-7313-7601 Luna Jiménez, Cristina https://orcid.org/0000-0001-5369-856X Arias Cuadrado, David Ferreiros López, Javier https://orcid.org/0000-0001-8834-3080 Fernández Martínez, Fernando https://orcid.org/0000-0003-3877-0089
Tipo de Documento:	Artículo
Título de Revista/Publicación:	Applied Sciences
Fecha:	12 Agosto 2021
ISSN:	2076-3417
Volumen:	11
Número:	16
Materias:	Telecomunicaciones
Palabras Clave Informales:	BERT; sentence-BERT; transformer; topic detection; video memorability; linear regression; DenseNet-121; PCA
Escuela:	E.T.S.I. Telecomunicación (UPM)
Departamento:	Ingeniería Electrónica
Licencias Creative Commons:	Reconocimiento

Texto completo

PDF (Portable Document Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (8MB)

Resumen

Not every visual media production is equally retained in memory. Recent studies have shown that the elements of an image, as well as their mutual semantic dependencies, provide a strong clue as to whether a video clip will be recalled on a second viewing or not. We believe that short textual descriptions encapsulate most of these relationships among the elements of a video, and thus they represent a rich yet concise source of information to tackle the problem of media memorability prediction. In this paper, we deepen the study of short captions as a means to convey in natural language the visual semantics of a video. We propose to use vector embeddings from a pretrained SBERT topic detection model with no adaptation as input features to a linear regression model, showing that, from such a representation, simpler algorithms can outperform deep visual models. Our results suggest that text descriptions expressed in natural language might be effective in embodying the visual semantics required to model video memorability.

Proyectos asociados

Tipo

Código

Acrónimo

Responsable

Título

Gobierno de España

TEC2017-84593-C2-1-R

Sin especificar

Juan Manuel Montero Martínez

Inferencia de la respuesta afectiva de los espectadores de un vídeo

Gobierno de España

TIN2017-85854-C4-4-R

Sin especificar

Javier Ferreiros López

Análisis afectivo de información multimedia con comunicación inclusiva natural

Gobierno de España

PRE2018-083225

Sin especificar

Más información

ID de Registro:	87027
Identificador DC:	https://oa.upm.es/87027/
Identificador OAI:	oai:oa.upm.es:87027
URL Portal Científico:	https://portalcientifico.upm.es/es/ipublic/item/9343044
Identificador DOI:	10.3390/app11167406
URL Oficial:	https://www.mdpi.com/2076-3417/11/16/7406
Depositado por:	iMarina Portal Científico
Depositado el:	29 Ene 2025 13:12
Ultima Modificación:	29 Ene 2025 13:12

Estadísticas

Exportar cita

Editar (sólo personal del Archivo)

En esta página

Menú principal

Buscar

Topic-oriented text features can match visual deep models of video memorability

Cita

Descripción

Texto completo

Resumen

Proyectos asociados

Más información

Acciones

Metrics

Altmetrics probando

Dimensions

Documentos

El repositorio

Agrupados por ...

Datos Investigación

Financiadores

Especiales

En otros formatos

Redes sociales

Información adicional