A method for outlier detection based on cluster analysis and visual expert criteria

Lara, Juan A. and Lizcano Casas, David and Rampérez Martín, Víctor and Soriano Camino, Francisco Javier (2019). A method for outlier detection based on cluster analysis and visual expert criteria. "Expert Systems" ; pp. 49-104. ISSN 0266-4720. https://doi.org/10.1111/exsy.12473.

Description

Title: A method for outlier detection based on cluster analysis and visual expert criteria
Author/s:
  • Lara, Juan A.
  • Lizcano Casas, David
  • Rampérez Martín, Víctor
  • Soriano Camino, Francisco Javier
Item Type: Article
Título de Revista/Publicación: Expert Systems
Date: November 2019
ISSN: 0266-4720
Subjects:
Freetext Keywords: Clustering; Data mining; KDD; Outlier detection; Visual expert criteria
Faculty: E.T.S. de Ingenieros Informáticos (UPM)
Department: Lenguajes y Sistemas Informáticos e Ingeniería del Software
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (395kB) | Preview

Abstract

Outlier detection is an important problem occurring in a wide range of areas. Outliers are the outcome of fraudulent behaviour, mechanical faults, human error, or simply natural deviations. Many data mining applications perform outlier detection, often as a preliminary step in order to filter out outliers and build more representative models. In this paper, we propose an outlier detection method based on a clustering process. The aim behind the proposal outlined in this paper is to overcome the specificity of many existing outlier detection techniques that fail to take into account the inherent dispersion of domain objects. The outlier detection method is based on four criteria designed to represent how human beings (experts in each domain) visually identify outliers within a set of objects after analysing the clusters. This has an advantage over other clustering‐based outlier detection techniques that are founded on a purely numerical analysis of clusters. Our proposal has been evaluated, with satisfactory results, on data (particularly time series) from two different domains: stabilometry, a branch of medicine studying balance‐related functions in human beings and electroencephalography (EEG), a neurological exploration used to diagnose nervous system disorders. To validate the proposed method, we studied method outlier detection and efficiency in terms of runtime. The results of regression analyses confirm that our proposal is useful for detecting outlier data in different domains, with a false positive rate of less than 2% and a reliability greater than 99%.

More information

Item ID: 63970
DC Identifier: http://oa.upm.es/63970/
OAI Identifier: oai:oa.upm.es:63970
DOI: 10.1111/exsy.12473
Official URL: https://onlinelibrary.wiley.com/doi/epdf/10.1111/exsy.12473
Deposited by: Memoria Investigacion
Deposited on: 28 Oct 2020 07:58
Last Modified: 28 Oct 2020 07:58
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM