Clasificación de cadenas de ADN usando Técnicas multifractales para el análisis de la fluctuación de series temporales

Tejeda Sánchez, José Javier (2022). Clasificación de cadenas de ADN usando Técnicas multifractales para el análisis de la fluctuación de series temporales. Trabajo Fin de Grado / Proyecto Fin de Carrera, E.T.S. de Ingeniería Agronómica, Alimentaria y de Biosistemas (UPM), Madrid.

Descripción

Título: Clasificación de cadenas de ADN usando Técnicas multifractales para el análisis de la fluctuación de series temporales
Autor/es:
  • Tejeda Sánchez, José Javier
Director/es:
Tipo de Documento: Trabajo Fin de Grado o Proyecto Fin de Carrera
Grado: Grado en Biotecnología
Fecha: Julio 2022
Materias:
ODS:
Escuela: E.T.S. de Ingeniería Agronómica, Alimentaria y de Biosistemas (UPM)
Departamento: Matemática Aplicada
Licencias Creative Commons: Reconocimiento - Sin obra derivada - No comercial

Texto completo

[thumbnail of TFG_JOSE_JAVIER_TEJEDA_SANCHEZ_A.pdf] PDF (Portable Document Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (1MB)

Resumen

The aim of this research is study the feasibility of classifying DNA sequences using parameters obtained using mathematical tools for sequence analysis. For this purpose, a study has been carried out on 200 DNA sequences that have been collected from different databases, such as NCBI [7] or EMBL [1]. The first step was to convert the DNA sequences into time series using a method described by Peng et al. [9]. Once the time series were obtained, the methods described in Peng et al. [9] were used to make a fluctuation analysis that provides a parameter called α. On the other hand, the time series were also used to perform a MF-DFA [6], with which h(q) values were obtained for q ∈ {−10,−9, ..., 9, 10} ∪ {±0,2} and interpolating polynomials of different degrees 1, 2 and 3. After calculating the parameters α and h(q), we used them to perform an hypothesis testing (TStudent, ANOVA, Tukey test), depending on the characteristics we wanted to clasify. Using the p-values obtained and the α and h(q) means, we can see which values could serve as classifiers. Finally, a classification has been carried out with two machine learning methods (k-means and neural networks). In both methods the study is done using only the h(q) parameters, and other classification is done using the α and h(q) parameters. The results of this research sugest that we can’t clasify the DNA sequences using neural networks because the error rates for all classifications are very high (the smallest is 0.18). This situation may be due to two possible reasons. The first is that the database is not large enough to train the classifier correctly. The second possible case is that there are not enough parameters for this task. However the hypothesis testing reveals significant differences between the parameters for the selected characteristics.

Más información

ID de Registro: 72057
Identificador DC: https://oa.upm.es/72057/
Identificador OAI: oai:oa.upm.es:72057
Depositado por: Biblioteca ETSI Agronómica, Alimentaria y de Biosistemas
Depositado el: 08 Nov 2022 10:16
Ultima Modificación: 08 Ene 2023 23:30