Citation
Núñez Moreno, Gonzalo
(2019).
Desarrollo de una estrategia para la detección de alteraciones en el número de copias en el genoma de pacientes con enfermedades raras mediante análisis de experimentos de secuenciación masiva.
Proyecto Fin de Carrera / Trabajo Fin de Grado, E.T.S. de Ingeniería Agronómica, Alimentaria y de Biosistemas (UPM), Madrid.
Abstract
Although rare diseases are, by definition, pathologies with very low frequency, considered together they affect around 6-8% of the population1. The cause of 80% of them is genetic1. A standard genetic diagnosis can take between 1 and 10 years2, being the most important challenge to reduce this period so that patients can have access to specific treatments and other family members can enter to preventive programs. The development of Next Generation Sequencing (NGS) techniques have changed completely the diagnosis protocols reducing time and economic costs by allowing the screening of hundreds to thousands of genes at a single snapshot. Together with the introduction of the NGS, new bioinformatics tools have been developed in order to extract relevant information from the huge amount of data produced. From the analysis point of view, there are two types of DNA variations that need to be detected: i) Single Nucleotide Variations (SNVs) and small insertions or deletions (indels) and ii) structural variants or CNVs (Copy Number Variations). CNVs are duplications or deletions of regions between 100 base pairs and 3.000.000 base pairs3. They have a very important role in evolution and genetic diseases. Although there are many CNV detections tools for NGS data, there is not a standard protocol with an acceptable behavior in different types of genomic test or sample types. The main objective of this TFG is the development of a strategy to detect CNVs in clinical samples of patients with genetic diseases. Thus, we have selected several algorithms from those described in the literature according to different criteria: 1) their ability to analyze data from gene panels and whole exome sequencing, 2) their adaptation to the computational resources available at the local setting, 3) their usage experience reported in the scientific community (number of citations, benchmarks results and methodology novelty). We finally chose 4 programs: ExomeDepth4, CoNVaDING5, CODEX26 and Panelcn.MOPS7. Then we performed a benchmark with reference samples to evaluate which algorithm or combination had the best performance in terms of sensibility and precision. We also tested the algorithm with clinical samples from the Fundación Jiménez Díaz Hospital to ensure its suitability. Finally, based on the results of the benchmark, we implemented a protocol that combines the output of the 4 algorithms and prioritize the CNVs detected based on the benchmark results. Using this protocol, we detected 14 of the 15 CNVs in clinical samples that were validated by Array Comparative Genomic Hybridization (aCGH). Our strategy has been implemented within the bioinformatics pipeline of the Genetics Department and will benefit hundreds of patients whose DNA is analyzed every year.