Telemetry data for machine learning based scheduling

González González, Alejandro (2020). Telemetry data for machine learning based scheduling. Thesis (Master thesis), E.T.S. de Ingenieros Informáticos (UPM).

Description

Title: Telemetry data for machine learning based scheduling
Author/s:
  • González González, Alejandro
Contributor/s:
  • Kiss, Péter
  • Molina Gónzalez, Martín
  • Kovacs, Benedek
Item Type: Thesis (Master thesis)
Masters title: Data Science
Date: 2020
Subjects:
Faculty: E.T.S. de Ingenieros Informáticos (UPM)
Department: Inteligencia Artificial
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (1MB) | Preview

Abstract

The amount of data generated by computing clusters is very large, including nodes resources data or application related data, among others. However, current systems do not exploit all the potential that this data can offer. This thesis attempts to put into use cluster telemetry data for two different purposes, scheduling and workload estimation. Motivated by the latest advancements in the machine learning field, a Deep Reinforcement Learning (DRL) based scheduler is proposed. Two different scheduling experiments are performed in a simulated cluster environment. The results show that the DRL based scheduler can be trained in specific cluster architectures to optimize performance parameters, such as, job completion time, hence, obtaining the best scheduling policy compared to traditional scheduling heuristics. In addition, Long Short-Term Memory (LSTM) neural networks are proposed to estimate the workload in computing clusters. Hence, an experiment using LSTM to forecast cluster resource usage was implemented. The results of the experiment reveal that telemetry data from the past can be successfully used to predict the future workload of the system. Furthermore, the results expose that LSTM neural networks can be used to anticipate system failures. Finally, a combination of DRL based scheduling and workload estimation is proposed as a future line of research.

More information

Item ID: 64997
DC Identifier: http://oa.upm.es/64997/
OAI Identifier: oai:oa.upm.es:64997
Deposited by: Biblioteca Facultad de Informatica
Deposited on: 26 Oct 2020 10:11
Last Modified: 26 Oct 2020 10:11
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM