A Knowledge Engineering Approach to Recognizing and Extracting Sequences of Nucleic Acids from Scientific Literature

García Remesal, Miguel; Maojo Garcia, Victor Manuel y Crespo del Arco, Jose (2010). A Knowledge Engineering Approach to Recognizing and Extracting Sequences of Nucleic Acids from Scientific Literature. En: "32nd Annual International Conference of the IEEE EMBS", 31/08/2010 - 04/09/2011, Buenos Aires, Argentina. ISBN 978-1-4244-4123-5.

Descripción

Título: A Knowledge Engineering Approach to Recognizing and Extracting Sequences of Nucleic Acids from Scientific Literature
Autor/es:
  • García Remesal, Miguel
  • Maojo Garcia, Victor Manuel
  • Crespo del Arco, Jose
Tipo de Documento: Ponencia en Congreso o Jornada (Artículo)
Título del Evento: 32nd Annual International Conference of the IEEE EMBS
Fechas del Evento: 31/08/2010 - 04/09/2011
Lugar del Evento: Buenos Aires, Argentina
Título del Libro: Proceedings of the 32nd Annual International Conference of the IEEE EMBS
Fecha: 2010
ISBN: 978-1-4244-4123-5
Materias:
Escuela: Facultad de Informática (UPM) [antigua denominación]
Departamento: Inteligencia Artificial
Licencias Creative Commons: Reconocimiento - Sin obra derivada - No comercial

Texto completo

[img]
Vista Previa
PDF (Document Portable Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (81kB) | Vista Previa

Resumen

In this paper we present a knowledge engineering approach to automatically recognize and extract genetic sequences from scientific articles. To carry out this task, we use a preliminary recognizer based on a finite state machine to extract all candidate DNA/RNA sequences. The latter are then fed into a knowledge-based system that automatically discards false positives and refines noisy and incorrectly merged sequences. We created the knowledge base by manually analyzing different manuscripts containing genetic sequences. Our approach was evaluated using a test set of 211 full-text articles in PDF format containing 3134 genetic sequences. For such set, we achieved 87.76% precision and 97.70% recall respectively. This method can facilitate different research tasks. These include text mining, information extraction, and information retrieval research dealing with large collections of documents containing genetic sequences.

Más información

ID de Registro: 9123
Identificador DC: http://oa.upm.es/9123/
Identificador OAI: oai:oa.upm.es:9123
URL Oficial: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5627316
Depositado por: Memoria Investigacion
Depositado el: 15 Nov 2011 11:40
Ultima Modificación: 20 Abr 2016 17:40
  • Open Access
  • Open Access
  • Sherpa-Romeo
    Compruebe si la revista anglosajona en la que ha publicado un artículo permite también su publicación en abierto.
  • Dulcinea
    Compruebe si la revista española en la que ha publicado un artículo permite también su publicación en abierto.
  • Recolecta
  • e-ciencia
  • Observatorio I+D+i UPM
  • OpenCourseWare UPM