Hybrid Approach Combining Machine Learning and a Rule-Based Expert System for Text Categorization

Villena Román, Julio; Collada Pérez, Sonia; Lana Serrano, Sara y González Cristóbal, José Carlos (2011). Hybrid Approach Combining Machine Learning and a Rule-Based Expert System for Text Categorization. En: "Twenty-Fourth International Florida Artificial Intelligence Research Society Conference", 18/05/2011 - 20/05/2011, Palm Beach, Florida, EEUU. pp. 323-328.

Descripción

Título: Hybrid Approach Combining Machine Learning and a Rule-Based Expert System for Text Categorization
Autor/es:
  • Villena Román, Julio
  • Collada Pérez, Sonia
  • Lana Serrano, Sara
  • González Cristóbal, José Carlos
Tipo de Documento: Ponencia en Congreso o Jornada (Artículo)
Título del Evento: Twenty-Fourth International Florida Artificial Intelligence Research Society Conference
Fechas del Evento: 18/05/2011 - 20/05/2011
Lugar del Evento: Palm Beach, Florida, EEUU
Título del Libro: Proceedings of the Twenty-Fourth International Florida Artificial Intelligence Research Society Conference
Fecha: 2011
Materias:
Escuela: E.U.I.T. Telecomunicación (UPM) [antigua denominación]
Departamento: Ingeniería y Arquitecturas Telemáticas [hasta 2014]
Licencias Creative Commons: Reconocimiento - Sin obra derivada - No comercial

Texto completo

[img]
Vista Previa
PDF (Document Portable Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (188kB) | Vista Previa

Resumen

This paper discusses a novel hybrid approach for text categorization that combines a machine learning algorithm, which provides a base model trained with a labeled corpus, with a rule-based expert system, which is used to improve the results provided by the previous classifier, by filtering false positives and dealing with false negatives. The main advantage is that the system can be easily fine-tuned by adding specific rules for those noisy or conflicting categories that have not been successfully trained. We also describe an implementation based on k-Nearest Neighbor and a simple rule language to express lists of positive, negative and relevant (multiword) terms appearing in the input text. The system is evaluated in several scenarios, including the popular Reuters-21578 news corpus for comparison to other approaches, and categorization using IPTC metadata, EUROVOC thesaurus and others. Results show that this approach achieves a precision that is comparable to top ranked methods, with the added value that it does not require a demanding human expert workload to train

Más información

ID de Registro: 13310
Identificador DC: http://oa.upm.es/13310/
Identificador OAI: oai:oa.upm.es:13310
URL Oficial: http://aaai.org/ocs/index.php/FLAIRS/FLAIRS11
Depositado por: Memoria Investigacion
Depositado el: 28 Nov 2012 10:01
Ultima Modificación: 21 Abr 2016 12:37
  • Open Access
  • Open Access
  • Sherpa-Romeo
    Compruebe si la revista anglosajona en la que ha publicado un artículo permite también su publicación en abierto.
  • Dulcinea
    Compruebe si la revista española en la que ha publicado un artículo permite también su publicación en abierto.
  • Recolecta
  • e-ciencia
  • Observatorio I+D+i UPM
  • OpenCourseWare UPM