Systematic Reliability Evaluation of FPGA Implemented CNN Accelerators

Gao, Zhen, Gao, Shihui, Yao, Yi, Liu, Qiang, Zeng, Shulin, Ge, Guangjun, Wang, Yu, Ullah, Anees and Reviriego Vasallo, Pedro ORCID: https://orcid.org/0000-0003-2273-1341 (2023). Systematic Reliability Evaluation of FPGA Implemented CNN Accelerators. "IEEE Transactions on Device and Materials Reliability", v. 23 (n. 1); pp. 116-126. https://doi.org/10.1109/TDMR.2023.3235767.

Description

Title: Systematic Reliability Evaluation of FPGA Implemented CNN Accelerators
Author/s:
  • Gao, Zhen
  • Gao, Shihui
  • Yao, Yi
  • Liu, Qiang
  • Zeng, Shulin
  • Ge, Guangjun
  • Wang, Yu
  • Ullah, Anees
  • Reviriego Vasallo, Pedro https://orcid.org/0000-0003-2273-1341
Item Type: Article
Título de Revista/Publicación: IEEE Transactions on Device and Materials Reliability
Date: 2023
Volume: 23
Subjects:
Faculty: E.T.S.I. Telecomunicación (UPM)
Department: Ingeniería de Sistemas Telemáticos
UPM's Research Group: Internet de Nueva Generación
Creative Commons Licenses: None

Full text

[thumbnail of Reliability_Evaluation_of_FPGA_implemented_CNN_Accelerators_R2l.pdf] PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (1MB)

Abstract

Convolutional neural networks (CNN) have become essential for many scientific and industrial applications, such as image classification and pattern detection. Among the devices that can implement neural networks, SRAM based FPGAs are a popular option due to their excellent parallel computing capability and good flexibility. However, SRAM-FPGAs are susceptible to radiation effects, which limits its application on safety critical applications. In this paper, the reliability of an accelerator based on the advanced Instruction-Set Architecture is evaluated based on hardware fault injection experiments. Each main module of the accelerator is evaluated separately, and the impact of parallelism and model features on the accelerator reliability is also examined. The experimental results reveal some important conclusions in terms of general hardware reliability and also of the particular model reliability. First, over 99% of SEUs on the computation modules will cause accuracy loss, and the reliability improves for higher parallelism. Second, a large portion of SEUs on the data mover and the instruction scheduler will cause system corruptions due to abnormal interactions with the ARM or other modules. Third, nonlinear activation and pooling layers are effective in reducing the effect of SEUs on computation modules, so models that use these layers tend to be more robust. The results provide a deep understanding of the impact of errors on CNNs implemented on ISA based FPGA accelerators (e.g., the Xilinx DPU).

Funding Projects

Type
Code
Acronym
Leader
Title
Government of Spain
PID2019-104207RB-I00
ACHILLES
Unspecified
Unspecified
Government of Spain
TSI-063000-2021-127
6GINTEGRATION
Unspecified
Unspecified
Government of Spain
Go2Edge Network
RED2018-102585-T
Unspecified
Unspecified

More information

Item ID: 76675
DC Identifier: https://oa.upm.es/76675/
OAI Identifier: oai:oa.upm.es:76675
DOI: 10.1109/TDMR.2023.3235767
Official URL: https://ieeexplore.ieee.org/document/10013764
Deposited by: Profesor Pedro Reviriego
Deposited on: 20 Nov 2023 07:51
Last Modified: 20 Nov 2023 07:51
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM