Mesh traversal and sorting for efficient memory usage in scientific codes

Barrio López-Cortijo, Pablo and Carreras Vaquer, Carlos (2011). Mesh traversal and sorting for efficient memory usage in scientific codes. In: "IEEE 30th International Performance Computing and Communications Conference (IPCCC)", 17/11/2011 - 19/11/2012, Orlando, EEUU. pp. 1-8.

Description

Title: Mesh traversal and sorting for efficient memory usage in scientific codes
Author/s:
  • Barrio López-Cortijo, Pablo
  • Carreras Vaquer, Carlos
Item Type: Presentation at Congress or Conference (Article)
Event Title: IEEE 30th International Performance Computing and Communications Conference (IPCCC)
Event Dates: 17/11/2011 - 19/11/2012
Event Location: Orlando, EEUU
Title of Book: IEEE 30th International Performance Computing and Communications Conference (IPCCC)
Date: November 2011
Subjects:
Faculty: E.T.S.I. Telecomunicación (UPM)
Department: Ingeniería Electrónica
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (632kB) | Preview

Abstract

Applications that operate on meshes are very popular in High Performance Computing (HPC) environments. In the past, many techniques have been developed in order to optimize the memory accesses for these datasets. Different loop transformations and domain decompositions are com- monly used for structured meshes. However, unstructured grids are more challenging. The memory accesses, based on the mesh connectivity, do not map well to the usual lin- ear memory model. This work presents a method to improve the memory performance which is suitable for HPC codes that operate on meshes. We develop a method to adjust the sequence in which the data are used inside the algorithm, by means of traversing and sorting the mesh. This sorted mesh can be transferred sequentially to the lower memory levels and allows for minimum data transfer requirements. The method also reduces the lower memory requirements dra- matically: up to 63% of the L1 cache misses are removed in a traditional cache system. We have obtained speedups of up to 2.58 on memory operations as measured in a general- purpose CPU. An improvement is also observed with se- quential access memories, where we have observed reduc- tions of up to 99% in the required low-level memory size.

More information

Item ID: 21749
DC Identifier: http://oa.upm.es/21749/
OAI Identifier: oai:oa.upm.es:21749
Official URL: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6108106&tag=1
Deposited by: Memoria Investigacion
Deposited on: 23 Nov 2013 10:25
Last Modified: 21 Apr 2016 12:29
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM