Full text
|
PDF
- Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (497kB) | Preview |
Kleinlein, Ricardo and Luna Jiménez, Cristina and Montero Martínez, Juan Manuel and Callejas Carrión, Zoraida and Fernández Martínez, Fernando (2019). Predicting group-level skin attention to short movies from audio-based LSTM-mixture of experts models. In: "INTERSPEECH 2019", 15/09/2019 - 19/09/2019, Graz, Austria. pp. 1-5. https://doi.org/10.21437/Interspeech.2019-2799.
Title: | Predicting group-level skin attention to short movies from audio-based LSTM-mixture of experts models |
---|---|
Author/s: |
|
Item Type: | Presentation at Congress or Conference (Article) |
Event Title: | INTERSPEECH 2019 |
Event Dates: | 15/09/2019 - 19/09/2019 |
Event Location: | Graz, Austria |
Title of Book: | Proceedings of INTERSPEECH 2019 |
Date: | 2019 |
Subjects: | |
Freetext Keywords: | Electrodermal activity; attention prediction; af-fective video content analysis; recurrent neural network; mixture of experts |
Faculty: | E.T.S.I. Telecomunicación (UPM) |
Department: | Ingeniería Electrónica |
Creative Commons Licenses: | Recognition - No derivative works - Non commercial |
|
PDF
- Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (497kB) | Preview |
Electrodermal activity (EDA) is a psychophysiological indicator that can be considered a somatic marker of the emotional and attentional reaction of subjects towards stimuli like audiovisual content. EDA measurements are not biased by the cognitive process of giving an opinion or a score to characterize the subjective perception, and group-level EDA recordings integrate the reaction of an audience, thus reducing the signal noise. This paper contributes to the field of audience's attention prediction to video content, extending previous novel work on the use of EDA as ground truth for prediction algorithms. Videos are segmented into shorter clips attending to the audience's increasing or decreasing attention, and we process videos' audio waveform to extract meaningful aural embeddings from a VG-Gish model pretrained on the Audioset database. While previous similar work on attention level prediction using only audio accomplished 69.83% accuracy, we propose a Mixture of Experts approach to train a binary classifier that outperforms the main existing state-of-the-art approaches predicting increasing and decreasing attention levels with 81.76% accuracy. These results confirm the usefulness of providing acoustic features with a semantic significance, and the convenience of considering experts over partitions of the dataset in order to predict group-level attention from audio.
Type | Code | Acronym | Leader | Title |
---|---|---|---|---|
Government of Spain | TEC2017-84593-C2-1-R | CAVIAR | Unspecified | Inferencia de la respuesta afectiva de los espectadores de un video |
Government of Spain | RTC-2016-5305-7 | ESITUR | Unspecified | Escaparate Interactivo Turístico. Accesible desde cualquier móvil (sin aplicación) y con contenidos extraídos de analizar e interpretar fotos públicas subidas a las redes sociales por turistas que visitan esa misma zona |
Government of Spain | TIN2017-85854-C4-4-R | AMIC | Unspecified | Análisis afectivo de información multimedia con comunicación inclusiva natural |
Item ID: | 64467 |
---|---|
DC Identifier: | https://oa.upm.es/64467/ |
OAI Identifier: | oai:oa.upm.es:64467 |
DOI: | 10.21437/Interspeech.2019-2799 |
Official URL: | https://www.isca-speech.org/archive/Interspeech_2019/abstracts/2799.html |
Deposited by: | Memoria Investigacion |
Deposited on: | 22 Mar 2021 16:24 |
Last Modified: | 22 Mar 2021 16:24 |