Robustness against Faults in Configuration Memories of FPGA-based LLMs

Gao, Zhen

, Yuan, Lini, Wang, Jingya, Liu, Qiang, Conde Díaz, Javier

, Reviriego Vasallo, Pedro

, Zeng, Shulin, Wang, Yu, Liu, Shanshan and Lombardi, Fabrizio

(2025). Robustness against Faults in Configuration Memories of FPGA-based LLMs. "IEEE Transactions on Circuits and Systems for Artificial Intelligence" ; pp. 1-12. https://doi.org/10.1109/TCASAI.2025.3552735.

Descripción

Título:	Robustness against Faults in Configuration Memories of FPGA-based LLMs
Autor/es:	Gao, Zhen https://orcid.org/0000-0001-9887-1418 Yuan, Lini Wang, Jingya Liu, Qiang Conde Díaz, Javier https://orcid.org/0000-0002-5304-0626 Reviriego Vasallo, Pedro https://orcid.org/0000-0003-2540-5234 Zeng, Shulin Wang, Yu Liu, Shanshan Lombardi, Fabrizio https://orcid.org/0000-0003-3152-3245
Tipo de Documento:	Artículo
Título de Revista/Publicación:	IEEE Transactions on Circuits and Systems for Artificial Intelligence
Fecha:	Marzo 2025
Materias:	Electrónica Informática Telecomunicaciones
ODS:	09. Industria, innovación e infraestructura
Palabras Clave Informales:	Field programmable gate arrays; Robustness; Hardware; Artificial intelligence; Transformers; Graphics processing units; Integrated circuit modeling; Fault location; Circuit faults; Sparse matrices; Dependability; Large Language Models; FPGAs
Escuela:	E.T.S.I. Telecomunicación (UPM)
Departamento:	Ingeniería de Sistemas Telemáticos
Grupo Investigación UPM:	Internet de Nueva Generación
Licencias Creative Commons:	Reconocimiento - Compartir igual

Texto completo

PDF (Portable Document Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (1MB)

Resumen

Large Language Models (LLMs) pose significant challenges in terms of speed and energy dissipation of AI systems. Dependability is a further important issue for LLM implementations; this is especially relevant for FPGAs that are vulnerable to soft errors in the configuration memory. Moreover, as current GPU based implementations are not energy efficient, there is interest in running LLMs on different technology platforms, such as FlightLLM (an FPGA based accelerator designed to run LLMs for energy efficiency). In this paper, we analyze and evaluate the robustness of FPGA-based LLMs against faults/errors in the configuration memories. For the evaluation, we first propose a PyTorch based fault injection simulator and based on the analysis of FlightLLM and we study its robustness against stuck-at faults on the configuration memory. Furthermore, we propose an efficient error detection technique based on a concurrent classifier. Evaluation results show that stuck-at errors on high bits of the logic units can dramatically degrade the LLM performance, and the proposed concurrent classifier can effectively detect errors with negligible complexity and overhead. Finally, a low-cost fault location scheme is proposed, so that the fault can be easily recovered by dynamic partial reconfiguration. The combination of the concurrent classifier error detection and fault location can be used to improve the robustness of a FPGA-based LLM efficiently, such as FlightLLM

Proyectos asociados

Tipo

Código

Acrónimo

Responsable

Título

Gobierno de España

PID2022-136684OB-C22

Sin especificar

Gobierno de España

PCI2024-153434

Sin especificar

Horizonte Europa

101140087

Sin especificar

Más información

ID de Registro:	88428
Identificador DC:	https://oa.upm.es/88428/
Identificador OAI:	oai:oa.upm.es:88428
Identificador DOI:	10.1109/TCASAI.2025.3552735
URL Oficial:	https://ieeexplore.ieee.org/document/10932828
Depositado por:	Profesor Pedro Reviriego
Depositado el:	23 Mar 2025 09:04
Ultima Modificación:	23 Mar 2025 09:04

Estadísticas

Exportar cita

Editar (sólo personal del Archivo)

En esta página

Menú principal

Buscar

Robustness against Faults in Configuration Memories of FPGA-based LLMs

Cita

Descripción

Texto completo

Resumen

Proyectos asociados

Más información

Acciones

Metrics

Altmetrics probando

Dimensions

Documentos

El repositorio

Agrupados por ...

Datos Investigación

Financiadores

Especiales

En otros formatos

Redes sociales

Información adicional