Hybrid Approach Combining Machine Learning and a Rule-Based Expert System for Text Categorization

Villena Román, Julio and Collada Pérez, Sonia and Lana Serrano, Sara and González Cristóbal, José Carlos (2011). Hybrid Approach Combining Machine Learning and a Rule-Based Expert System for Text Categorization. In: "Twenty-Fourth International Florida Artificial Intelligence Research Society Conference", 18/05/2011 - 20/05/2011, Palm Beach, Florida, EEUU. pp. 323-328.

Description

Title: Hybrid Approach Combining Machine Learning and a Rule-Based Expert System for Text Categorization
Author/s:
  • Villena Román, Julio
  • Collada Pérez, Sonia
  • Lana Serrano, Sara
  • González Cristóbal, José Carlos
Item Type: Presentation at Congress or Conference (Article)
Event Title: Twenty-Fourth International Florida Artificial Intelligence Research Society Conference
Event Dates: 18/05/2011 - 20/05/2011
Event Location: Palm Beach, Florida, EEUU
Title of Book: Proceedings of the Twenty-Fourth International Florida Artificial Intelligence Research Society Conference
Date: 2011
Subjects:
Faculty: E.U.I.T. Telecomunicación (UPM)
Department: Ingeniería y Arquitecturas Telemáticas [hasta 2014]
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[thumbnail of INVE_MEM_2011_111721.pdf]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (188kB) | Preview

Abstract

This paper discusses a novel hybrid approach for text categorization that combines a machine learning algorithm, which provides a base model trained with a labeled corpus, with a rule-based expert system, which is used to improve the results provided by the previous classifier, by filtering false positives and dealing with false negatives. The main advantage is that the system can be easily fine-tuned by adding specific rules for those noisy or conflicting categories that have not been successfully trained. We also describe an implementation based on k-Nearest Neighbor and a simple rule language to express lists of positive, negative and relevant (multiword) terms appearing in the input text. The system is evaluated in several scenarios, including the popular Reuters-21578 news corpus for comparison to other approaches, and categorization using IPTC metadata, EUROVOC thesaurus and others. Results show that this approach achieves a precision that is comparable to top ranked methods, with the added value that it does not require a demanding human expert workload to train

More information

Item ID: 13310
DC Identifier: https://oa.upm.es/13310/
OAI Identifier: oai:oa.upm.es:13310
Official URL: http://aaai.org/ocs/index.php/FLAIRS/FLAIRS11
Deposited by: Memoria Investigacion
Deposited on: 28 Nov 2012 10:01
Last Modified: 21 Apr 2016 12:37
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM