Aptafolding: gan that folds aptamers

Matesanz Fernández-Arias, Ana (2020). Aptafolding: gan that folds aptamers. Thesis (Master thesis), E.T.S. de Ingenieros Informáticos (UPM).


Title: Aptafolding: gan that folds aptamers
  • Matesanz Fernández-Arias, Ana
Item Type: Thesis (Master thesis)
Masters title: Inteligencia Artificial
Date: February 2020
Faculty: E.T.S. de Ingenieros Informáticos (UPM)
Department: Inteligencia Artificial
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[thumbnail of TFM_ANA_MATESANZ_FERNANDEZ-ARIAS.pdf] PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (4MB)


The purpose of this project has been the development of a neural network that performs the three-dimensional folding of aptamers in order to predict their structure and to accelerate the SELEX (Systematic Evolution of Ligands by EXponential enrichment) process. This neural network has been a part of the Madrid iGEM (International Genetically Engineered Machine) project of the year 2019 entitled AEGIS (Aptamer Evolution for Global In situ Sense). The AEGIS team developed a cholera sensor based on aptamers and automated the manufacturing technology of this sensor to make it: simple to use, cheap to manufacture, replicable and implementable to similar diseases. The network implemented is a GAN (Generative Adversarial Network) applied in the field of molecular biology. First, a database of aptamers and their three-dimensional structures has been created with the help of an optimized algorithm from a team from a past iGEM edition. The inputs of the database were nucleotide sequences and their most probable spatial angles between molecules, the database was created in order to help the network to learn the relationship between the motifs inside the sequence of an aptamer and the small angles that these motifs could show. Then, the network was created: it is composed by two CNN (Convolutional Neural Network) with different structures (Discriminative and Generative from the GAN structure) and it permits the feedback between both networks in order to learn and to predict the most probable angles for each DNA sequence given. Next, the created network returns the calculated three-dimensional structure for a given aptamer. Finally, in order to validate the new aptamers, a well-known software dedicated to scores each 3D DNA structure based on the folding energy and called Rosetta was used. A set of several validated aptamers (low folding energy) and their three-dimensional structure, obtained thanks to the developed neural network, will determine an initial library for the SELEX process currently used in multiple molecular biology fields. In summary, the proposed neural network returns the most probable three-dimensional structure for a given nucleotide sequence, the neural network is used to create an initial library of aptamers for the SELEX process, and it permits the reduction of laboratory materials and time by the elimination of multiple cycles in the SELEX process. The software developed steps were (see Figure 1 ): - A Generative Adversarial Network (GAN) development that works with biological information and is adapted to our challenges. This is divided into two neural networks: Generative and Discriminative (Convolutional Neural Networks nets are used). - A database creation for training the GAN. It is created thanks to a code of a team of iGEM previous edition: INSA-Lyon (2016). We also proposed to optimize that code by including multithreading and other optimizations, such as terminal use, the database (CSV) creation by the computer console, or the use in Linux and MacOS operating systems. We had to create the database because there no exists a bigger database on webpages that permitted to us to use them to our purposes. - And a validation mechanism of the created folded aptamers by the network. We decided to use the Rosetta software, it calculated the free energy of the given structure and returns a scoring. We implemented the validation in order to evaluate and score our aptamers creation and judge the performance of our algorithm. We obtained a simply executable code and all the objectives were fulfilled; all the component parts are understandable and can be reprogrammed easily. The use of the software resolves an existing problem, allows us to minimize the SELEX process by the elimination of some of its cycles, and our neural network obtains low free energy scores (the media of free energy decrease from 500 to 50) and reduces the creation time of a folded aptamer drastically compared with software techniques (from 30-40 minutes by the Lyon´s software, to 3 seconds in our neural network, once the GAN is trained).

More information

Item ID: 58761
DC Identifier: https://oa.upm.es/58761/
OAI Identifier: oai:oa.upm.es:58761
Deposited by: Biblioteca Facultad de Informatica
Deposited on: 06 Mar 2020 09:34
Last Modified: 06 Mar 2020 09:35
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM