Analysis and implementation of deep learning algorithms for face to face translation based on audio-visual representations

Alonso de Apellániz, Patricia (2020). Analysis and implementation of deep learning algorithms for face to face translation based on audio-visual representations. Thesis (Master thesis), E.T.S.I. Telecomunicación (UPM).

Description

Title: Analysis and implementation of deep learning algorithms for face to face translation based on audio-visual representations
Author/s:
  • Alonso de Apellániz, Patricia
Contributor/s:
  • Belmonte Hernández, Alberto
Item Type: Thesis (Master thesis)
Masters title: Ingeniería de Telecomunicación
Date: 2020
Subjects:
Freetext Keywords: Deep Learning, face transfer, image generation, synthesized frames, encoder, Convolutional Neural Networks (CNNs), autoencoder, Generative Adversial Networks (GANs), Generator, Discriminator, data processing, dataset, Python, qualitative and quantitative evaluations.
Faculty: E.T.S.I. Telecomunicación (UPM)
Department: Señales, Sistemas y Radiocomunicaciones
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (7MB) | Preview

Abstract

Generating synthesized images, being able to animate or transform them somehow, has lately been experiencing a breathtaking evolution thanks, in part, to the use of neural networks in their approaches. In particular, trying to transfer different facial gestures and audio to an existing image has caught the attention in terms of research and even socially, due to its potential applications. Throughout this Master's Thesis, a study of the state of the art in the different techniques that exist for this transfer of facial gestures involving even lip movement between audiovisual media will be carried out. Specifically, it will be focused on different existing methods and researches that generate talking faces based on several features from the multimedia information used. From this study, the implementation, development, and evaluation of several systems will be done as follows. First, knowing the relevant importance of training deep neural networks using a big and well-processed dataset, VoxCeleb2 will be downloaded and will suffer a process of conditioning and adaptation regarding image and audio information extraction from the original video to be used as the input of the networks. These features will be ones widely used in the state of the art for tasks as the one mentioned, such as image key points and audio spectrograms. As the second approach of this Thesis, the implementation of three different convolutional networks, in particular Generative Adversarial Networks (GANs), will be done based on [1]'s implementation but adding some new configurations such as the network that manages the audio features or loss functions depending on this new architecture and the network's behavior. In other words, the first implementation will consist of the network based on the paper mentioned; to this implementation, a encoder for audio features will be added; and, finally, the training will be based on this last architecture but taking into account a loss calculated for the audio learning. Finally, to compare and evaluate each network's results both quantitative metrics and qualitative evaluations will be carried out. Since the final output of these systems will be obtaining a clear and realistic video with a random face to which gestures from another one have been transferred, the perceptual visual evaluation is key to solve this problem.

More information

Item ID: 62958
DC Identifier: http://oa.upm.es/62958/
OAI Identifier: oai:oa.upm.es:62958
Deposited by: Biblioteca ETSI Telecomunicación
Deposited on: 10 Jul 2020 08:13
Last Modified: 10 Jul 2020 08:13
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM