Texto completo
|
PDF (Portable Document Format)
- Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (8MB) |
ORCID: https://orcid.org/0000-0002-7313-7601, Luna Jiménez, Cristina
ORCID: https://orcid.org/0000-0001-5369-856X, Arias Cuadrado, David, Ferreiros López, Javier
ORCID: https://orcid.org/0000-0001-8834-3080 and Fernández Martínez, Fernando
ORCID: https://orcid.org/0000-0003-3877-0089
(2021).
Topic-oriented text features can match visual deep models of video memorability.
"Applied Sciences", v. 11
(n. 16);
p. 7406.
ISSN 2076-3417.
https://doi.org/10.3390/app11167406.
| Título: | Topic-oriented text features can match visual deep models of video memorability |
|---|---|
| Autor/es: |
|
| Tipo de Documento: | Artículo |
| Título de Revista/Publicación: | Applied Sciences |
| Fecha: | 12 Agosto 2021 |
| ISSN: | 2076-3417 |
| Volumen: | 11 |
| Número: | 16 |
| Materias: | |
| Palabras Clave Informales: | BERT; sentence-BERT; transformer; topic detection; video memorability; linear regression; DenseNet-121; PCA |
| Escuela: | E.T.S.I. Telecomunicación (UPM) |
| Departamento: | Ingeniería Electrónica |
| Licencias Creative Commons: | Reconocimiento |
|
PDF (Portable Document Format)
- Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (8MB) |
Not every visual media production is equally retained in memory. Recent studies have shown that the elements of an image, as well as their mutual semantic dependencies, provide a strong clue as to whether a video clip will be recalled on a second viewing or not. We believe that short textual descriptions encapsulate most of these relationships among the elements of a video, and thus they represent a rich yet concise source of information to tackle the problem of media memorability prediction. In this paper, we deepen the study of short captions as a means to convey in natural language the visual semantics of a video. We propose to use vector embeddings from a pretrained SBERT topic detection model with no adaptation as input features to a linear regression model, showing that, from such a representation, simpler algorithms can outperform deep visual models. Our results suggest that text descriptions expressed in natural language might be effective in embodying the visual semantics required to model video memorability.
| ID de Registro: | 87027 |
|---|---|
| Identificador DC: | https://oa.upm.es/87027/ |
| Identificador OAI: | oai:oa.upm.es:87027 |
| URL Portal Científico: | https://portalcientifico.upm.es/es/ipublic/item/9343044 |
| Identificador DOI: | 10.3390/app11167406 |
| URL Oficial: | https://www.mdpi.com/2076-3417/11/16/7406 |
| Depositado por: | iMarina Portal Científico |
| Depositado el: | 29 Ene 2025 13:12 |
| Ultima Modificación: | 29 Ene 2025 13:12 |
Publicar en el Archivo Digital desde el Portal Científico