Pix2Pitch: generating music from paintings by using conditionals GANs

Rivas Ruzafa, Elena (2020). Pix2Pitch: generating music from paintings by using conditionals GANs. Thesis (Master thesis), E.T.S. de Ingenieros Informáticos (UPM).

Description

Title: Pix2Pitch: generating music from paintings by using conditionals GANs
Author/s:
  • Rivas Ruzafa, Elena
Contributor/s:
Item Type: Thesis (Master thesis)
Masters title: Inteligencia Artificial
Date: July 2020
Subjects:
Faculty: E.T.S. de Ingenieros Informáticos (UPM)
Department: Inteligencia Artificial
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[thumbnail of TFM_ELENA_RIVAS_RUZAFA.pdf]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (9MB) | Preview

Abstract

Generative adversarial networks (GANs) (Goodfellow et al., 2014) have been extensively used to transform and create images or sounds in their own domains. But transformation between different modalities is a problem that hasn’t been so explored. This work proposes a method to generate sound from image, based on Pix2Pix architecture (Isola et al., 2017), a conditional GAN that was designed for general purpose image-to-image translation. In this work a new implementation that allows creating music from images has been developed. The main goal is to create music that describes specific paintings and to answer the question: How does that image sound?. This is an answer that blind people could find useful in several applications, like in museums. To do so it has been taken into account different thesis that posit that there is an interaction between visual art and music, also several works that study synesthetic experimentations. The process implies: first to label and pair images and sounds from different style and points in time, second extract common features from the data by exploring multiple methods for music feature extraction and third to introduce multimodal layers into the GAN. Finally, a method to create novel pieces of music by using the generated sound features has been implemented. As it will be presented in the state-of-the-art section, some advances in crossmodal generation have been achieved but most of them are focused on creating image from sound or image from text, but only a few explore image-to-sound transformations.

More information

Item ID: 63694
DC Identifier: https://oa.upm.es/63694/
OAI Identifier: oai:oa.upm.es:63694
Deposited by: Biblioteca Facultad de Informatica
Deposited on: 09 Sep 2020 13:26
Last Modified: 09 Sep 2020 13:26
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM