Scalable architecture for automating Machine Learning model monitoring

Rúa Martínez, Javier de la (2020). Scalable architecture for automating Machine Learning model monitoring. Thesis (Master thesis), E.T.S. de Ingenieros Informáticos (UPM).


Title: Scalable architecture for automating Machine Learning model monitoring
  • Rúa Martínez, Javier de la
  • Haridi, Seif
  • Dowling, Jim
Item Type: Thesis (Master thesis)
Masters title: Data Science
Date: 2020
Freetext Keywords: Model monitoring, Streaming, Scalability, Cloud-native, Data drift, Outliers, Machine Learning
Faculty: E.T.S. de Ingenieros Informáticos (UPM)
Department: Otro
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (7MB) | Preview


Last years, due to the advent of more sophisticated tools for exploratory data analysis, data management, Machine Learning (ML) model training and model serving into production, the concept of MLOps has gained more popularity. As an effort to bring DevOps processes to the ML lifecycle, MLOps aims at more automation in the execution of diverse and repetitive tasks along the cycle and at smoother interoperability between teams and tools involved. In this context, the main cloud providers have built their own ML platforms [4, 34, 61], offered as services in their cloud solutions. Moreover, multiple frameworks have emerged to solve concrete problems such as data testing, data labelling, distributed training or prediction interpretability, and new monitoring approaches have been proposed [32, 33, 65]. Among all the stages in the ML lifecycle, one of the most commonly overlooked although relevant is model monitoring. Recently, cloud providers have presented their own tools to use within their platforms [4, 61] while work is ongoing to integrate existent frameworks [72] into open-source model serving solutions [38]. Most of these frameworks are either built as an extension of an existent platform (i.e lack portability), follow a scheduled batch processing approach at a minimum rate of hours, or present limitations for certain outliers and drift algorithms due to the platform architecture design in which they are integrated. In this work, a scalable automated cloudnative architecture is designed and evaluated for ML model monitoring in a streaming approach. An experimentation conducted on a 7-node cluster with 250.000 requests at different concurrency rates shows maximum latencies of 5.9, 29.92 and 30.86 seconds after request time for 75% of distance-based outliers detection, windowed statistics and distribution-based data drift detection, respectively, using windows of 15 seconds length and 6 seconds of watermark delay.

More information

Item ID: 66444
DC Identifier:
OAI Identifier:
Deposited by: Biblioteca Facultad de Informatica
Deposited on: 17 Mar 2021 13:28
Last Modified: 17 Mar 2021 13:28
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM