GO!GAN: Grasping Objects with Generative Adversarial Networks

Losada de la Rosa, Francisco José (2020). GO!GAN: Grasping Objects with Generative Adversarial Networks. Thesis (Master thesis), E.T.S. de Ingenieros Informáticos (UPM).


Title: GO!GAN: Grasping Objects with Generative Adversarial Networks
  • Losada de la Rosa, Francisco José
  • Vanschoren, Joaquin
  • Holenderski, Mike
  • Engelenhoven, Nico van
Item Type: Thesis (Master thesis)
Masters title: Data Science
Date: August 2020
Faculty: E.T.S. de Ingenieros Informáticos (UPM)
Department: Otro
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (4MB) | Preview


Image editing encompasses the process of altering images, and has been an active and interdisciplinary research topic from several decades. Images constitute the core element in different fields like marketing, television, arts, and more recently computer vision. Graphic software programs such as vector graphics editors and raster graphics have been the primary tools with which users manipulate, alter and enhance images. In the last decades, the advances in Artificial Intelligence (AI), with improved neural networks together with increased computing power, has led to strong improvements in the field of computer vision. Computer vision aims to gain a high-level understanding of digital images and video, seeking to understand and automate tasks that were exclusive to the graphic software programs. In this context, generative models are attracting the attention of the research community and the last decade has experienced an important upturn since the Generative Adversarial Networks (GANs) appeared in 2014, which researchers as Yann Lecun claim to be the most interesting machine learning idea of the decade. GANs were first used for image generation, as a way to infer an artistic component to an AI. Soon, GAN applications expanded to fields like image inpainting, future state prediction, style transfer and superresolution. The use of conditional GANs opened the door to learning specific image transformations, with strong enough capabilities to generate what are known as “Deep Fakes”, in which a person in an image or video is edited in a way that looks like real content when it is actually fake. In the realm of “Deep Fakes”, most of the existing applications transform existing components of the image into different shapes or positions. As so, mouth position can be modified so the person seems to be speaking, or the body position can be changed into different postures. Introducing new objects into the image is one of the next challenges to be faced by generative models. It is especially challenging because 2D images lack the third dimensional depth information, finding it hard to render new objects in a spatially coherent way. In this study, we focus on the rendering of new objects that will be held by a previously existing hand in the source image.

More information

Item ID: 65316
DC Identifier: https://oa.upm.es/65316/
OAI Identifier: oai:oa.upm.es:65316
Deposited by: Biblioteca Facultad de Informatica
Deposited on: 10 Nov 2020 08:16
Last Modified: 10 Nov 2020 08:16
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM