Calculating classifier calibration performance with a custom modification of Weka

Zlotnik Enaliev, Alexander and Gallardo Antolín, Ascensión and Montero Martínez, Juan Manuel (2014). Calculating classifier calibration performance with a custom modification of Weka. In: "Proceedings of the 4th International Conference on Integrated Information", 05/09/2014 - 08/09/2014, Madrid, Spain. pp. 128-133.

Description

Title: Calculating classifier calibration performance with a custom modification of Weka
Author/s:
  • Zlotnik Enaliev, Alexander
  • Gallardo Antolín, Ascensión
  • Montero Martínez, Juan Manuel
Item Type: Presentation at Congress or Conference (Article)
Event Title: Proceedings of the 4th International Conference on Integrated Information
Event Dates: 05/09/2014 - 08/09/2014
Event Location: Madrid, Spain
Title of Book: International Conference on Integrated Information (IC-ININFO 2014)
Date: 2014
Volume: 1644
Subjects:
Faculty: E.T.S.I. Telecomunicación (UPM)
Department: Electrónica Física
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (174kB) | Preview

Abstract

Calibration is often overlooked in machine-learning problem-solving approaches, even in situations where an accurate estimation of predicted probabilities, and not only a discrimination between classes, is critical for decision-making. One of the reasons is the lack of readily available open-source software packages which can easily calculate calibration metrics. In order to provide one such tool, we have developed a custom modification of the Weka data mining software, which implements the calculation of Hosmer-Lemeshow groups of risk and the Pearson chi-square statistic comparison between estimated and observed frequencies for binary problems. We provide calibration performance estimations with Logistic regression (LR), BayesNet, Naïve Bayes, artificial neural network (ANN), support vector machine (SVM), knearest neighbors (KNN), decision trees and Repeated Incremental Pruning to Produce Error Reduction (RIPPER) models with six different datasets. Our experiments show that SVMs with RBF kernels exhibit the best results in terms of calibration, while decision trees, RIPPER and KNN are highly unlikely to produce well-calibrated models.

Funding Projects

TypeCodeAcronymLeaderTitle
Government of SpainDPI2010-21247-C02-02UnspecifiedUnspecifiedUnspecified

More information

Item ID: 37518
DC Identifier: http://oa.upm.es/37518/
OAI Identifier: oai:oa.upm.es:37518
Deposited by: Memoria Investigacion
Deposited on: 22 May 2017 16:55
Last Modified: 22 May 2017 16:55
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM