Improving grid fault tolerance by means of global behavior modeling.

Montes, Jesús and Sánchez, Alberto and Pérez Hernández, María de los Santos (2010). Improving grid fault tolerance by means of global behavior modeling.. In: "Ninth International Symposium on Parallel and Distributed Computing, 2010", 07/07/2010 - 09/07/2010, Estambul, Turquia. ISBN 978-0-7695-4120-4.

Description

Title: Improving grid fault tolerance by means of global behavior modeling.
Author/s:
  • Montes, Jesús
  • Sánchez, Alberto
  • Pérez Hernández, María de los Santos
Item Type: Presentation at Congress or Conference (Article)
Event Title: Ninth International Symposium on Parallel and Distributed Computing, 2010
Event Dates: 07/07/2010 - 09/07/2010
Event Location: Estambul, Turquia
Title of Book: Proceedings of the Ninth International Symposium on Parallel and Distributed Computing, 2010
Date: 2010
ISBN: 978-0-7695-4120-4
Subjects:
Faculty: Facultad de Informática (UPM)
Department: Arquitectura y Tecnología de Sistemas Informáticos
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (1MB) | Preview

Abstract

Grid systems have proved to be one of the most important new alternatives to face challenging problems but, to exploit its benefits, dependability and fault tolerance are key aspects. However, the vast complexity of these systems limits the efficiency of traditional fault tolerance techniques. It seems necessary to distinguish between resource-level fault tolerance (focused on every machine) and service-level fault tolerance (focused on global behavior). Techniques based on these concepts can handle system complexity and increase dependability. We present an autonomous, self-adaptive fault tolerance framework for grid systems, based on a new approach to model distributed environments. The grid is considered as a single entity, instead of a set of independent resources. This point of view focuses on service-level fault tolerance, allowing us to see the big picture and understand the system's global behavior. The resulting model's simplicity is the key to provide system-wide fault tolerance.

More information

Item ID: 6852
DC Identifier: http://oa.upm.es/6852/
OAI Identifier: oai:oa.upm.es:6852
Official URL: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5532500&tag=1
Deposited by: Memoria Investigacion
Deposited on: 04 May 2011 10:32
Last Modified: 20 Apr 2016 15:59
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM