Machine learning for Parkinson's disease detection: analyzing hybrid voice data with spectral, topological, and random matrix methods

Dominguez Monterroza, Andy ORCID: https://orcid.org/0000-0002-5274-7443, Mateos Caballero, Alfonso ORCID: https://orcid.org/0000-0003-4764-6047 and Jiménez Martín, Antonio ORCID: https://orcid.org/0000-0002-4947-8430 (2026). Machine learning for Parkinson's disease detection: analyzing hybrid voice data with spectral, topological, and random matrix methods. "IEEE Open Journal of the Computer Society", v. 7 ; pp. 314-325. ISSN 2644-1268. https://doi.org/10.1109/OJCS.2026.3651318.

Descripción

Título: Machine learning for Parkinson's disease detection: analyzing hybrid voice data with spectral, topological, and random matrix methods
Autor/es:
Tipo de Documento: Artículo
Título de Revista/Publicación: IEEE Open Journal of the Computer Society
Fecha: 23 Enero 2026
ISSN: 2644-1268
Volumen: 7
Materias:
ODS:
Palabras Clave Informales: Accuracy, Acoustics, Classification, Diseases, Feature extraction, Machine learning, Noise, Parkinson's disease, Random matrix theory, Spectral features, Speech, Speech analysis, Speech synthesis, Synthetic data, Topological data analysis, Training
Escuela: E.T.S. de Ingenieros Informáticos (UPM)
Departamento: Inteligencia Artificial
Licencias Creative Commons: Reconocimiento

Texto completo

[thumbnail of 10444724.pdf] PDF (Portable Document Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (1MB)

Resumen

Parkinson's disease (PD) is a progressive neurodegenerative disorder that affects both motor and speech functions. Advances in machine learning and signal processing have enabled non-invasive PD detection through voice analysis. This study proposes a comprehensive mathematical framework for PD classification that integrates topological, statistical, and spectral representations of speech signals. The framework combines topological descriptors derived from persistent homology, statistical measures based on random matrix theory, and spectral features extracted from frequency-domain analysis to capture complementary information about vocal dynamics. A hybrid training strategy was employed, using synthetic speech data generated from real recordings to train the models, while real samples were reserved exclusively for evaluation. Experimental results demonstrate that spectral features, particularly when fused with statistical descriptors, yield the highest discriminative power, achieving 98.00% accuracy and 97.98% F1-score with a multi-layer perceptron classifier. In contrast, topological descriptors provided limited standalone performance, serving instead as complementary components that enrich the overall representation. The findings highlight the potential of combining diverse mathematical representations to improve speech-based PD detection, especially in scenarios with limited access to clinically annotated data.

Proyectos asociados

Tipo
Código
Acrónimo
Responsable
Título
Gobierno de España
PID2021-122209OB-C31
Sin especificar
Sin especificar
Sin especificar
Gobierno de España
PID2024-155179NB-C22
Sin especificar
Sin especificar
Sin especificar
Gobierno de España
RED2022-134540-T
Sin especificar
Sin especificar
Sin especificar

Más información

ID de Registro: 94363
Identificador DC: https://oa.upm.es/94363/
Identificador OAI: oai:oa.upm.es:94363
URL Portal Científico: https://portalcientifico.upm.es/es/ipublic/item/10444724
Identificador DOI: 10.1109/OJCS.2026.3651318
URL Oficial: https://www.computer.org/csdl/journal/oj/2026/01/1...
Depositado por: iMarina Portal Científico
Depositado el: 25 Feb 2026 19:04
Ultima Modificación: 25 Feb 2026 19:04