Fostering interpretability of data mining models through data perturbation

Belkoura, Seddik, Zanin, Massimiliano and Latorre De La Fuente, Antonio ORCID: https://orcid.org/0000-0002-8718-5735 (2019). Fostering interpretability of data mining models through data perturbation. "Expert Systems with Applications", v. 137 ; pp. 191-201. ISSN 0957-4174. https://doi.org/10.1016/j.eswa.2019.07.001.

Description

Title: Fostering interpretability of data mining models through data perturbation
Author/s:
Item Type: Article
Título de Revista/Publicación: Expert Systems with Applications
Date: 2019
ISSN: 0957-4174
Volume: 137
Subjects:
Freetext Keywords: Interpretability, Data mining, Random forest, Artificial neural networks
Faculty: E.T.S. de Ingenieros Informáticos (UPM)
Department: Arquitectura y Tecnología de Sistemas Informáticos
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[thumbnail of INVE_MEM_2019_326536.pdf]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (1MB) | Preview

Abstract

With the widespread adoption of data mining models to solve real-world problems, the scientific community is facing the need of increasing their interpretability and comprehensibility. This is especially relevant in the case of black box models, in which inputs and outputs are usually connected by highly complex and nonlinear functions; in applications requiring an interaction between the user and the model; and when the machine’s solution disagrees with the human experience. In this contribution we present a new methodology that allows to simplify the process of understanding the rules behind a classification model, even in the case of black box ones. It is based on the perturbation of the features describing one instance, and on finding the minimal variation required to change the forecasted class. It thus yields simplified rules describing under which circumstances would the solution have been different, and allows to compare these with the human expectation. We show how such methodology is well defined, model-agnostic, easy to implement and modular; and demonstrate its usefulness with several synthetic and real-world data sets.

Funding Projects

Type
Code
Acronym
Leader
Title
Government of Spain
TIN2017-83132-C2-2-R
Unspecified
Universidad Politécnica de Madrid
Visualización analítica aplicada
Universidad Politécnica de Madrid
PINV-18-XEOGHQ-19-4QTEBP
Unspecified
Unspecified
Unspecified

More information

Item ID: 67064
DC Identifier: https://oa.upm.es/67064/
OAI Identifier: oai:oa.upm.es:67064
DOI: 10.1016/j.eswa.2019.07.001
Official URL: https://www.sciencedirect.com/journal/expert-syste...
Deposited by: Memoria Investigacion
Deposited on: 17 May 2021 09:36
Last Modified: 17 May 2021 09:36
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM