Parallel efficient data loading

Jiménez Peris, Ricardo and Ballesteros Cámara, Francisco José and Azqueta Alzúaz, Ainhoa and Kranas, Pavlos and Burgos, Diego and Martínez, Patricio (2019). Parallel efficient data loading. In: "8th International Conference on Data Science, Technology and Applications (DATA 2019)", 26-28 Jul 2019, Praga, República Checa. ISBN 978-989-758-377-3. pp. 465-469. https://doi.org/10.5220/0008318904650469.

Description

Title: Parallel efficient data loading
Author/s:
  • Jiménez Peris, Ricardo
  • Ballesteros Cámara, Francisco José
  • Azqueta Alzúaz, Ainhoa
  • Kranas, Pavlos
  • Burgos, Diego
  • Martínez, Patricio
Item Type: Presentation at Congress or Conference (Article)
Event Title: 8th International Conference on Data Science, Technology and Applications (DATA 2019)
Event Dates: 26-28 Jul 2019
Event Location: Praga, República Checa
Title of Book: ADITCA 2019: Special Session on Appliances for Data-Intensive and Time Critical Applications
Date: 2019
ISBN: 978-989-758-377-3
Volume: 1
Subjects:
Freetext Keywords: Loading; Extract-Transform-Load (ETL); Scalable databases; NUMA architectures; Database appliance; Scalable transactional management
Faculty: E.T.S. de Ingenieros Informáticos (UPM)
Department: Lenguajes y Sistemas Informáticos e Ingeniería del Software
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (320kB) | Preview

Abstract

In this paper we discuss how we architected and developed a parallel data loader for LeanXcale database. The loader is characterized for its efficiency and parallelism. LeanXcale can scale up and scale out to very large numbers and loading data in the traditional way it is not exploiting its full potential in terms of the loading rate it can reach. For this reason, we have created a parallel loader that can reach the maximum insertion rate LeanXcale can handle. LeanXcale also exhibits a dual interface, key-value and SQL, that has been exploited by the parallel loader. Basically, the loading leverages the key-value API and results in a highly efficient process that avoids the overhead of SQL processing. Finally, in order to guarantee the parallelism we have developed a data sampler that samples data to generate a histogram of data distribution and use it to pre-split the regions across LeanXcale instances to guarantee that all instances get an even amount of data during loading, thus g uaranteeing the peak processing loading capability of the deployment.

Funding Projects

TypeCodeAcronymLeaderTitle
Horizon 2020732051CloudDBApplianceBULL SASEuropean cloud in-memory database appliance with predictable performance for critical applications

More information

Item ID: 56632
DC Identifier: http://oa.upm.es/56632/
OAI Identifier: oai:oa.upm.es:56632
DOI: 10.5220/0008318904650469
Official URL: http://www.scitepress.org/DigitalLibrary/Link.aspx?doi=10.5220/0008318904650469
Deposited by: Memoria Investigacion
Deposited on: 22 Oct 2019 09:14
Last Modified: 22 Oct 2019 09:14
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM