Interpretable deep prototype-based neural networks: can a 1 look like a 0?

García Cuesta, Esteban ORCID: https://orcid.org/0000-0002-1215-3333, Manrique Gamo, Daniel ORCID: https://orcid.org/0000-0002-0792-4156 and Ionescu, Radu Constantin ORCID: https://orcid.org/0009-0000-7017-0556 (2025). Interpretable deep prototype-based neural networks: can a 1 look like a 0?. "Electronics", v. 14 (n. 18); p. 3584. ISSN 0883-4989. https://doi.org/10.3390/electronics14183584.

Descripción

Título: Interpretable deep prototype-based neural networks: can a 1 look like a 0?
Autor/es:
Tipo de Documento: Artículo
Título de Revista/Publicación: Electronics
Fecha: 10 Septiembre 2025
ISSN: 0883-4989
Volumen: 14
Número: 18
Materias:
ODS:
Palabras Clave Informales: Activation analysis, Artificial intelligence systems, Classification (of information), Classification performance, Data sample, Input space, Interpretability, Interpretable AI, Model outputs, Network architecture, Neural networks, Prototype-based network, Robustness of explanation
Escuela: E.T.S. de Ingenieros Informáticos (UPM)
Departamento: Inteligencia Artificial
Licencias Creative Commons: Reconocimiento

Texto completo

[thumbnail of 10390061.pdf] PDF (Portable Document Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (4MB)

Resumen

Prototype-Based Networks (PBNs) are inherently interpretable architectures that facilitate understanding of model outputs by analyzing the activation of specific neurons-referred to as prototypes-during the forward pass. The learned prototypes serve as transformations of the input space into a latent representation that more effectively encapsulates the main characteristics shared across data samples, thereby enhancing classification performance. Crucially, these prototypes can be decoded and projected back into the original input space, providing direct interpretability of the features learned by the network. While this characteristic marks a meaningful advancement toward the realization of fully interpretable artificial intelligence systems, our findings reveal that prototype representations can be deliberately or inadvertently manipulated without compromising the superficial appearance of explainability. In this study, we conduct a series of empirical investigations that demonstrate this phenomenon, framing it as a structural paradox potentially intrinsic to the architecture or its design, which may represent a significant robustness challenge for explainable AI methodologies.

Más información

ID de Registro: 95036
Identificador DC: https://oa.upm.es/95036/
Identificador OAI: oai:oa.upm.es:95036
URL Portal Científico: https://portalcientifico.upm.es/es/ipublic/item/10390061
Identificador DOI: 10.3390/electronics14183584
URL Oficial: https://www.mdpi.com/2079-9292/14/18/3584
Depositado por: iMarina Portal Científico
Depositado el: 23 Mar 2026 18:27
Ultima Modificación: 23 Mar 2026 18:27