A Knowledge Engineering Approach to Recognizing and Extracting Sequences of Nucleic Acids from Scientific Literature

García Remesal, Miguel ORCID: https://orcid.org/0000-0002-5948-8691, Maojo Garcia, Victor Manuel ORCID: https://orcid.org/0000-0001-5103-4292 and Crespo del Arco, Jose ORCID: https://orcid.org/0000-0002-0772-5421 (2010). A Knowledge Engineering Approach to Recognizing and Extracting Sequences of Nucleic Acids from Scientific Literature. In: "32nd Annual International Conference of the IEEE EMBS", 31/08/2010 - 04/09/2011, Buenos Aires, Argentina. ISBN 978-1-4244-4123-5.

Description

Title: A Knowledge Engineering Approach to Recognizing and Extracting Sequences of Nucleic Acids from Scientific Literature
Author/s:
Item Type: Presentation at Congress or Conference (Article)
Event Title: 32nd Annual International Conference of the IEEE EMBS
Event Dates: 31/08/2010 - 04/09/2011
Event Location: Buenos Aires, Argentina
Title of Book: Proceedings of the 32nd Annual International Conference of the IEEE EMBS
Date: 2010
ISBN: 978-1-4244-4123-5
Subjects:
Faculty: Facultad de Informática (UPM)
Department: Inteligencia Artificial
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[thumbnail of INVE_MEM_2010_84668.pdf]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (81kB) | Preview

Abstract

In this paper we present a knowledge engineering approach to automatically recognize and extract genetic sequences from scientific articles. To carry out this task, we use a preliminary recognizer based on a finite state machine to extract all candidate DNA/RNA sequences. The latter are then fed into a knowledge-based system that automatically discards false positives and refines noisy and incorrectly merged sequences. We created the knowledge base by manually analyzing different manuscripts containing genetic sequences. Our approach was evaluated using a test set of 211 full-text articles in PDF format containing 3134 genetic sequences. For such set, we achieved 87.76% precision and 97.70% recall respectively. This method can facilitate different research tasks. These include text mining, information extraction, and information retrieval research dealing with large collections of documents containing genetic sequences.

More information

Item ID: 9123
DC Identifier: https://oa.upm.es/9123/
OAI Identifier: oai:oa.upm.es:9123
Official URL: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumb...
Deposited by: Memoria Investigacion
Deposited on: 15 Nov 2011 11:40
Last Modified: 20 Apr 2016 17:40
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM