Distributed data stream processing for clustering near-the-edge

Landaeta Lauria, Javier Enrique (2019). Distributed data stream processing for clustering near-the-edge. Thesis (Master thesis), E.T.S.I. y Sistemas de Telecomunicación (UPM).

Description

Title: Distributed data stream processing for clustering near-the-edge
Author/s:
  • Landaeta Lauria, Javier Enrique
Contributor/s:
  • Mozo Velasco, Alberto
Item Type: Thesis (Master thesis)
Masters title: Internet of Things (MIoT)
Date: 2019
Subjects:
Freetext Keywords: Almacenamientos de datos; Comunicación de datos
Faculty: E.T.S.I. y Sistemas de Telecomunicación (UPM)
Department: Sistemas Informáticos
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (1MB) | Preview

Abstract

In this Final Master Project, a Machine Learning algorithm for clustering named CluStream was applied in a data streaming context. Additionally, for the data stream processing, a distributed Apache Spark platform for massive processing also was applied. The purpose of this project was to apply the CluStream Algorithm to classify data and distribute the processing close to the data generation using Spark. This project was divided in two phases. The first phase aimed to take some simulated data, create a DataStream, publish the DataStream in a Kafka streaming bus and let Spark Streaming to subscribe the data and apply a clustering algorithm using Spark MLlib. The simulated data was stored in a database and queried frequently in order to simulate the sending data coming from real in-field sensors. Because the clustering algorithm is a nonsupervised algorithm, the Dataset used was a synthetic Dataset where the group classification is well-known. The second phase of this project aimed to present the clustered data in a graphical representation. Additionally, this second part intended to publish the clustered data again in the Kafka bus under another Topic Name and subscribe an additional Database in order to store that clustered data. Then, a NodeJS application was created in order to listen to any data change in the Database and represent that data graphically. The idea in this second part was to present a friendly online representation of that data that is being consumed and processed by the clustering algorithm.

More information

Item ID: 65737
DC Identifier: http://oa.upm.es/65737/
OAI Identifier: oai:oa.upm.es:65737
Deposited by: Biblioteca Universitaria Campus Sur
Deposited on: 15 Dec 2020 06:56
Last Modified: 15 Dec 2020 06:56
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM