Full text
Preview |
PDF
- Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (188kB) | Preview |
Villena Román, Julio and Collada Pérez, Sonia and Lana Serrano, Sara and González Cristóbal, José Carlos (2011). Hybrid Approach Combining Machine Learning and a Rule-Based Expert System for Text Categorization. In: "Twenty-Fourth International Florida Artificial Intelligence Research Society Conference", 18/05/2011 - 20/05/2011, Palm Beach, Florida, EEUU. pp. 323-328.
Title: | Hybrid Approach Combining Machine Learning and a Rule-Based Expert System for Text Categorization |
---|---|
Author/s: |
|
Item Type: | Presentation at Congress or Conference (Article) |
Event Title: | Twenty-Fourth International Florida Artificial Intelligence Research Society Conference |
Event Dates: | 18/05/2011 - 20/05/2011 |
Event Location: | Palm Beach, Florida, EEUU |
Title of Book: | Proceedings of the Twenty-Fourth International Florida Artificial Intelligence Research Society Conference |
Date: | 2011 |
Subjects: | |
Faculty: | E.U.I.T. Telecomunicación (UPM) |
Department: | Ingeniería y Arquitecturas Telemáticas [hasta 2014] |
Creative Commons Licenses: | Recognition - No derivative works - Non commercial |
Preview |
PDF
- Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (188kB) | Preview |
This paper discusses a novel hybrid approach for text categorization that combines a machine learning algorithm, which provides a base model trained with a labeled corpus, with a rule-based expert system, which is used to improve the results provided by the previous classifier, by filtering false positives and dealing with false negatives. The main advantage is that the system can be easily fine-tuned by adding specific rules for those noisy or conflicting categories that have not been successfully trained. We also describe an implementation based on k-Nearest Neighbor and a simple rule language to express lists of positive, negative and relevant (multiword) terms appearing in the input text. The system is evaluated in several scenarios, including the popular Reuters-21578 news corpus for comparison to other approaches, and categorization using IPTC metadata, EUROVOC thesaurus and others. Results show that this approach achieves a precision that is comparable to top ranked methods, with the added value that it does not require a demanding human expert workload to train
Item ID: | 13310 |
---|---|
DC Identifier: | https://oa.upm.es/13310/ |
OAI Identifier: | oai:oa.upm.es:13310 |
Official URL: | http://aaai.org/ocs/index.php/FLAIRS/FLAIRS11 |
Deposited by: | Memoria Investigacion |
Deposited on: | 28 Nov 2012 10:01 |
Last Modified: | 21 Apr 2016 12:37 |