Analysis and implementation of deep learning algorithms for face to face translation based on audio-visual representations

Alonso de Apellániz, Patricia (2020). Analysis and implementation of deep learning algorithms for face to face translation based on audio-visual representations. Thesis (Master thesis), E.T.S.I. Telecomunicación (UPM).

Description

Title: Analysis and implementation of deep learning algorithms for face to face translation based on audio-visual representations
Author/s:
  • Alonso de Apellániz, Patricia
Contributor/s:
Item Type: Thesis (Master thesis)
Masters title: Ingeniería de Telecomunicación
Date: 2020
Subjects:
Freetext Keywords: Deep Learning, face transfer, image generation, synthesized frames, encoder, Convolutional Neural Networks (CNNs), autoencoder, Generative Adversial Networks (GANs), Generator, Discriminator, data processing, dataset, Python, qualitative and quantitative evaluations.
Faculty: E.T.S.I. Telecomunicación (UPM)
Department: Señales, Sistemas y Radiocomunicaciones
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[thumbnail of TESIS_MASTER_PATRICIA_ALONSO_DE_APELLANIZ.pdf]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (7MB) | Preview

Abstract

Generating synthesized images, being able to animate or transform them somehow, has
lately been experiencing a breathtaking evolution thanks, in part, to the use of neural
networks in their approaches. In particular, trying to transfer different facial gestures
and audio to an existing image has caught the attention in terms of research and even
socially, due to its potential applications.
Throughout this Master's Thesis, a study of the state of the art in the different techniques
that exist for this transfer of facial gestures involving even lip movement between
audiovisual media will be carried out. Specifically, it will be focused on different existing
methods and researches that generate talking faces based on several features from
the multimedia information used. From this study, the implementation, development,
and evaluation of several systems will be done as follows.
First, knowing the relevant importance of training deep neural networks using a big
and well-processed dataset, VoxCeleb2 will be downloaded and will suffer a process of
conditioning and adaptation regarding image and audio information extraction from
the original video to be used as the input of the networks. These features will be ones
widely used in the state of the art for tasks as the one mentioned, such as image key
points and audio spectrograms.
As the second approach of this Thesis, the implementation of three different convolutional
networks, in particular Generative Adversarial Networks (GANs), will be done
based on [1]'s implementation but adding some new configurations such as the network
that manages the audio features or loss functions depending on this new architecture
and the network's behavior. In other words, the first implementation will consist of the
network based on the paper mentioned; to this implementation, a encoder for audio
features will be added; and, finally, the training will be based on this last architecture
but taking into account a loss calculated for the audio learning.
Finally, to compare and evaluate each network's results both quantitative metrics and
qualitative evaluations will be carried out. Since the final output of these systems will
be obtaining a clear and realistic video with a random face to which gestures from
another one have been transferred, the perceptual visual evaluation is key to solve this
problem.

More information

Item ID: 62958
DC Identifier: https://oa.upm.es/62958/
OAI Identifier: oai:oa.upm.es:62958
Deposited by: Biblioteca ETSI Telecomunicación
Deposited on: 10 Jul 2020 08:13
Last Modified: 10 Jul 2020 08:13
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM