Texto completo
|
PDF (Portable Document Format)
- Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (1MB) |
ORCID: https://orcid.org/0000-0001-9887-1418, Yuan, Lini, Wang, Jingya, Liu, Qiang, Conde Díaz, Javier
ORCID: https://orcid.org/0000-0002-5304-0626, Reviriego Vasallo, Pedro
ORCID: https://orcid.org/0000-0003-2540-5234, Zeng, Shulin, Wang, Yu, Liu, Shanshan and Lombardi, Fabrizio
ORCID: https://orcid.org/0000-0003-3152-3245
(2025).
Robustness against Faults in Configuration Memories of FPGA-based LLMs.
"IEEE Transactions on Circuits and Systems for Artificial Intelligence"
;
pp. 1-12.
https://doi.org/10.1109/TCASAI.2025.3552735.
| Título: | Robustness against Faults in Configuration Memories of FPGA-based LLMs |
|---|---|
| Autor/es: |
|
| Tipo de Documento: | Artículo |
| Título de Revista/Publicación: | IEEE Transactions on Circuits and Systems for Artificial Intelligence |
| Fecha: | Marzo 2025 |
| Materias: | |
| ODS: | |
| Palabras Clave Informales: | Field programmable gate arrays; Robustness; Hardware; Artificial intelligence; Transformers; Graphics processing units; Integrated circuit modeling; Fault location; Circuit faults; Sparse matrices; Dependability; Large Language Models; FPGAs |
| Escuela: | E.T.S.I. Telecomunicación (UPM) |
| Departamento: | Ingeniería de Sistemas Telemáticos |
| Grupo Investigación UPM: | Internet de Nueva Generación |
| Licencias Creative Commons: | Reconocimiento - Compartir igual |
|
PDF (Portable Document Format)
- Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (1MB) |
Large Language Models (LLMs) pose significant challenges in terms of speed and energy dissipation of AI systems. Dependability is a further important issue for LLM implementations; this is especially relevant for FPGAs that are vulnerable to soft errors in the configuration memory. Moreover, as current GPU based implementations are not energy efficient, there is interest in running LLMs on different technology platforms, such as FlightLLM (an FPGA based accelerator designed to run LLMs for energy efficiency). In this paper, we analyze and evaluate the robustness of FPGA-based LLMs against faults/errors in the configuration memories. For the evaluation, we first propose a PyTorch based fault injection simulator and based on the analysis of FlightLLM and we study its robustness against stuck-at faults on the configuration memory. Furthermore, we propose an efficient error detection technique based on a concurrent classifier. Evaluation results show that stuck-at errors on high bits of the logic units can dramatically degrade the LLM performance, and the proposed concurrent classifier can effectively detect errors with negligible complexity and overhead. Finally, a low-cost fault location scheme is proposed, so that the fault can be easily recovered by dynamic partial reconfiguration. The combination of the concurrent classifier error detection and fault location can be used to improve the robustness of a FPGA-based LLM efficiently, such as FlightLLM
| ID de Registro: | 88428 |
|---|---|
| Identificador DC: | https://oa.upm.es/88428/ |
| Identificador OAI: | oai:oa.upm.es:88428 |
| Identificador DOI: | 10.1109/TCASAI.2025.3552735 |
| URL Oficial: | https://ieeexplore.ieee.org/document/10932828 |
| Depositado por: | Profesor Pedro Reviriego |
| Depositado el: | 23 Mar 2025 09:04 |
| Ultima Modificación: | 23 Mar 2025 09:04 |
Publicar en el Archivo Digital desde el Portal Científico