Texto completo
|
PDF (Portable Document Format)
- Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (1MB) |
ORCID: https://orcid.org/0000-0001-9668-7318, Reviriego Vasallo, Pedro
ORCID: https://orcid.org/0000-0003-2540-5234, Liu, Shanshan
ORCID: https://orcid.org/0000-0001-6226-2880, Niknia, Farzad
ORCID: https://orcid.org/0000-0002-4062-3638, Tang, Xiaochen
ORCID: https://orcid.org/0000-0003-2590-5810, Gao, Zhen
ORCID: https://orcid.org/0000-0001-9887-1418 and Lombardi, Fabrizio
ORCID: https://orcid.org/0000-0003-3152-3245
(2025).
Perturbation-based error detection and correction (PBEDC) in dependable large-scale machine learning systems.
"Future Generation Computer Systems", v. 173
;
p. 107928.
ISSN 0167-739X.
https://doi.org/10.1016/j.future.2025.107928.
| Título: | Perturbation-based error detection and correction (PBEDC) in dependable large-scale machine learning systems |
|---|---|
| Autor/es: |
|
| Tipo de Documento: | Artículo |
| Título de Revista/Publicación: | Future Generation Computer Systems |
| Fecha: | Diciembre 2025 |
| ISSN: | 0167-739X |
| Volumen: | 173 |
| Materias: | |
| ODS: | |
| Palabras Clave Informales: | Error detection, Error correction, Large-scale neural networks, Soft errors, CLIP |
| Escuela: | E.T.S.I. Telecomunicación (UPM) |
| Departamento: | Ingeniería de Sistemas Telemáticos |
| Grupo Investigación UPM: | Internet de Nueva Generación |
| Licencias Creative Commons: | Ninguna |
|
PDF (Portable Document Format)
- Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (1MB) |
Conventional error-tolerant schemes for Neural Networks (NNs) usually require either redundancy, or changes in normal operation, leading to considerable overheads. They are not feasible for large-scale Machine Learning (ML) systems that typically employ several complex networks. This paper proposes a Perturbation-Based Error Detection and Correction (PBEDC) scheme designed to perform error detection and correction by reutilizing the inference process. Dependable performance is defined by the ability to operate correctly in the presence of errors and is a key characteristic under consideration. PBEDC employs a compact set of representative samples that are selected to monitor a few check nodes with intermediate signals. The effectiveness of PBEDC is evaluated by taking Contrastive Language-Image Pre-Training (CLIP) networks as a case study. Compared with traditional schemes that use the final prediction as the check node, PBEDC achieves a superior error detection rate (> 95 ) and can handle single bit-flip errors in the weights (which cannot be captured in existing schemes). This also enables the correction of errors when the proposed scheme is combined with the use of parity codes. Furthermore, in this paper, the analysis and simulation results show that the number of PBEDC samples required for achieving a satisfactory error tolerance is very small; the complexity of the proposed scheme does not scale up with the network size and this advantage is very pronounced with large-scale ML systems.
| ID de Registro: | 89163 |
|---|---|
| Identificador DC: | https://oa.upm.es/89163/ |
| Identificador OAI: | oai:oa.upm.es:89163 |
| URL Portal Científico: | https://portalcientifico.upm.es/es/ipublic/item/10381053 |
| Identificador DOI: | 10.1016/j.future.2025.107928 |
| URL Oficial: | https://www.sciencedirect.com/science/article/pii/... |
| Depositado por: | Profesor Pedro Reviriego |
| Depositado el: | 25 May 2025 08:42 |
| Ultima Modificación: | 15 Oct 2025 01:01 |
Publicar en el Archivo Digital desde el Portal Científico