Clustering data streams with streamingKmeans

Bueno Prieto, Alejandro (2019). Clustering data streams with streamingKmeans. Proyecto Fin de Carrera / Trabajo Fin de Grado, E.T.S.I. de Sistemas Informáticos (UPM), Madrid.

Description

Title: Clustering data streams with streamingKmeans
Author/s:
  • Bueno Prieto, Alejandro
Contributor/s:
  • Gómez Canaval, Sandra
Item Type: Final Project
Degree: Grado en Ingeniería del Software
Date: July 2019
Subjects:
Freetext Keywords: Machine Learning; Data Streaming
Faculty: E.T.S.I. de Sistemas Informáticos (UPM)
Department: Sistemas Informáticos
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (4MB) | Preview

Abstract

Machine Learning es una de los áreas que han surgido gracias a la Inteligencia Artificial. Cada vez es más común leer noticias sobre los avances tecnológicos en aplicaciones en diferentes escenarios de uso gracias al Machine Learning. En consecuencia, la demanda de profesionales que entiendan y utilicen correctamente las diferentes técnicas de esta subdisciplina de la IA ha aumentado considerablemente. En Machine Learning hay varias técnicas de modelado y algoritmos diferentes, especialmente aquellos para la minería de datos que se han utilizado durante mucho tiempo y existe una extensa literatura sobre ellos. Sin embargo, los nuevos enfoques de estos algoritmos adaptados al trabajo en entornos de streaming y los nuevos algoritmos especialmente desarrollados para este fin, han ido surgiendo recientemente debido al aumento del número y la importancia de los flujos de datos gracias a los Sistemas Ciberfísicos y a la Internet de las Cosas. En este Trabajo de Fin de Grado, se abordará el problema de la adaptación de un algoritmo clásico de Machine Learning en un entorno de streaming. Posteriormente este algoritmo se ha implementado dentro de una aplicación que es ejecutada dentro la plataforma de computación ultra-escalable Spark Streaming. Esta implementación ha sido probada sobre diferentes datasets públicos para realizar clustering sobre dichos datos simulados en un entorno de streaming. Finalmente, se ha realizado una evaluación de los resultados obtenidos los cuales se analizan también a la luz de las conclusiones y el trabajo futuro de este Proyecto Fin de Grado. Abstract: Machine Learning is one of the fields that have emerged thanks to Artificial Intelligence (IA). It is becoming increasingly common to read news about technological advances thanks to Machine Learning in several uses scenarios. As a consequence, the demand for professionals who understand and use correctly the different techniques of this subdiscipline of AI has increased considerably. In Machine Learning there are several different modeling techniques and algorithms, especially those for data mining that have been used for a long time and there is extensive literature on them. However, the new approaches of these algorithms adapted to work in streaming scenarios and the new algorithms specially developed for this purpose, have been emerging recently due to the increase in the number and importance of data streams thanks to Cyberphysical Systems and the Internet of Things. In this Final Project, the problem of adapting a classic Machine Learning algorithm in a streaming scenario will be addressed using a streaming clustering algorithm to solve clustering with public datasets. In this context, an application was implemented to use this algorithm in the ultra-scalable computing platform named Spark Streaming. The implementation of the streaming algorithm on different public datasets was tested in order to clustering these data and extract some analysis from the obtained results. Finally, an evaluation of these results was made and the conclusions and the future work were introduced.

More information

Item ID: 56384
DC Identifier: http://oa.upm.es/56384/
OAI Identifier: oai:oa.upm.es:56384
Deposited by: Biblioteca Universitaria Campus Sur
Deposited on: 11 Sep 2019 08:22
Last Modified: 11 Sep 2019 08:22
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM