Clustering of data streams with dynamic Gaussian mixture models: an IoT application in industrial processes

Díaz Rozo, Javier and Bielza Lozoya, María Concepción and Larrañaga Múgica, Pedro María (2018). Clustering of data streams with dynamic Gaussian mixture models: an IoT application in industrial processes. "IEEE Internet of Things Journal", v. 5 (n. 5); pp. 3533-3547. ISSN 2327-4662. https://doi.org/10.1109/JIOT.2018.2840129.

Description

Title: Clustering of data streams with dynamic Gaussian mixture models: an IoT application in industrial processes
Author/s:
  • Díaz Rozo, Javier
  • Bielza Lozoya, María Concepción
  • Larrañaga Múgica, Pedro María
Item Type: Article
Título de Revista/Publicación: IEEE Internet of Things Journal
Date: October 2018
ISSN: 2327-4662
Volume: 5
Subjects:
Freetext Keywords: Concept drift; Data stream; Dynamic clustering; Gaussian mixture models (GMM); Industrial Internet of Things (IIoT)
Faculty: E.T.S. de Ingenieros Informáticos (UPM)
Department: Inteligencia Artificial
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (3MB) | Preview

Abstract

In industrial Internet of Things applications with sensors sending dynamic process data at high speed, producing actionable insights at the right time is challenging. A key problem concerns processing a large amount of data, while the underlying dynamic phenomena related to the machine is possibly evolving over time due to factors, such as degradation. This makes any actionable model become obsolete and necessary to be updated. To cope with this problem, in this paper we propose a new unsupervised learning algorithm based on Gaussian mixture models called Gaussian-based dynamic probabilistic clustering (GDPC) mainly based on integrating and adapting three well known algorithms for use in dynamic scenarios: the expectationmaximization (EM) algorithm to estimate the model parameters and the Page-Hinkley test and Chernoff bound to detect concept drifts. Unlike other unsupervised methods, the model induced by the GDPC provides the membership probabilities of each instance to each cluster. This allows to determine, through a Brier score analysis, the robustness of the instance assignment and its evolution each time a concept drift is detected. Also, the algorithm works with very little data and significantly less computing power being able to decide whether (and when) to change the model. The algorithm is tested using synthetic data and data streams from an industrial testbed, where different operational states are automatically identified, giving good results in terms of classification accuracy, sensitivity, and specificity.

Funding Projects

TypeCodeAcronymLeaderTitle
Government of SpainTIN2016-79684-PUnspecifiedUniversidad Politécnica de MadridAvances en clasificación multidimensional y detección de anomalías con redes bayesianas
Madrid Regional GovernmentS2013/ICE-2845CASI – CAMUnspecifiedConceptos y aplicaciones de los sistemas inteligentes

More information

Item ID: 54569
DC Identifier: http://oa.upm.es/54569/
OAI Identifier: oai:oa.upm.es:54569
DOI: 10.1109/JIOT.2018.2840129
Official URL: https://ieeexplore.ieee.org/document/8364530
Deposited by: Memoria Investigacion
Deposited on: 29 Apr 2019 10:09
Last Modified: 29 Apr 2019 10:09
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM