Exploring shared state in key-value store for window-based multi-pattern streaming

Marcu, Ovidiu-Cristian and Tudoran, Radu and Nicolae, Bogdan and Costan, Alexandru and Antoniu, Gabriel and Pérez Hernández, María de los Santos (2017). Exploring shared state in key-value store for window-based multi-pattern streaming. In: "17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID 2017)", 14-17 May 2017, Madrid, España. ISBN 978-1-5090-6610-0. pp. 1044-1052. https://doi.org/10.1109/CCGRID.2017.126.

Description

Title: Exploring shared state in key-value store for window-based multi-pattern streaming
Author/s:
  • Marcu, Ovidiu-Cristian
  • Tudoran, Radu
  • Nicolae, Bogdan
  • Costan, Alexandru
  • Antoniu, Gabriel
  • Pérez Hernández, María de los Santos
Item Type: Presentation at Congress or Conference (Article)
Event Title: 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID 2017)
Event Dates: 14-17 May 2017
Event Location: Madrid, España
Title of Book: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID 2017)
Date: 2017
ISBN: 978-1-5090-6610-0
Subjects:
Freetext Keywords: Big Data; Memory deduplication; Streaming analytics; Sliding-window aggregations; Apache Flink
Faculty: E.T.S. de Ingenieros Informáticos (UPM)
Department: Arquitectura y Tecnología de Sistemas Informáticos
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (196kB) | Preview

Abstract

We are now witnessing an unprecedented growth of data that needs to be processed at always increasing rates in order to extract valuable insights. Big Data streaming analytics tools have been developed to cope with the online dimension of data processing: they enable real-time handling of live data sources by means of stateful aggregations (operators). Current state-of-art frameworks (e.g. Apache Flink [1]) enable each operator to work in isolation by creating data copies, at the expense of increased memory utilization. In this paper, we explore the feasibility of deduplication techniques to address the challenge of reducing memory footprint for window-based stream processing without significant impact on performance. We design a deduplication method specifically for windowbased operators that rely on key-value stores to hold a shared state. We experiment with a synthetically generated workload while considering several deduplication scenarios and based on the results, we identify several potential areas of improvement. Our key finding is that more fine-grained interactions between streaming engines and (key-value) stores need to be designed in order to better respond to scenarios that have to overcome memory scarcity.

Funding Projects

TypeCodeAcronymLeaderTitle
Horizon 2020MSCA-ITN-2014-642963BigStorageUnspecifiedBigStorage: Storage-based convergence between HPC and Cloud to handle Big Data

More information

Item ID: 50320
DC Identifier: http://oa.upm.es/50320/
OAI Identifier: oai:oa.upm.es:50320
DOI: 10.1109/CCGRID.2017.126
Official URL: https://ieeexplore.ieee.org/document/7973813
Deposited by: Memoria Investigacion
Deposited on: 29 May 2019 07:58
Last Modified: 29 May 2019 07:58
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM