Time Series Classification of Raw Voice Waveforms for Parkinson's Disease Detection Using Generative Adversarial Network-Driven Data Augmentation

Rey Paredes, Marta ORCID: https://orcid.org/0009-0006-1443-3714, Pérez Sánchez, Carlos Javier ORCID: https://orcid.org/0000-0001-6385-9080 and Mateos Caballero, Alfonso ORCID: https://orcid.org/0000-0003-4764-6047 (2025). Time Series Classification of Raw Voice Waveforms for Parkinson's Disease Detection Using Generative Adversarial Network-Driven Data Augmentation. "IEEE Open Journal of the Computer Society", v. 6 ; pp. 72-84. ISSN 26441268. https://doi.org/10.1109/OJCS.2024.3504864.

Descripción

Título: Time Series Classification of Raw Voice Waveforms for Parkinson's Disease Detection Using Generative Adversarial Network-Driven Data Augmentation
Autor/es:
Tipo de Documento: Artículo
Título de Revista/Publicación: IEEE Open Journal of the Computer Society
Fecha: 1 Enero 2025
ISSN: 26441268
Volumen: 6
Materias:
ODS:
Palabras Clave Informales: Cepstral analysis; Data Augmentation; Data models; DATABASES; deep learning; Diseases; Feature Extraction; Generative Adversarial Networks; Parkinson's Disease; Recording; Spectrogram; Time Series Analysis; vocal signal analysi
Escuela: E.T.S. de Ingenieros Informáticos (UPM)
Departamento: Inteligencia Artificial
Licencias Creative Commons: Reconocimiento

Texto completo

[thumbnail of 10313128.pdf] PDF (Portable Document Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (3MB)

Resumen

Parkinson's disease (PD) is a neurodegenerative disorder that affects more than 10 million people worldwide. Despite its prevalence, the detection of PD remains a complicated task, as no gold standard test has yet been developed to provide an accurate diagnosis. In this context, many recent studies have focused on the automatic detection and progression tracking of PD from voice-related characteristics, being feature engineering the most common approach. This work intends to address an existing research gap by introducing a novel strategy that analyzes raw voice waveforms. Despite recent advancements, one of the significant hurdles is still the lack of extensive and diverse datasets. This article also implements a data augmentation solution. Big Vocoder Slicing Adversarial Network (BigVSAN) is used to generate synthetic voice data that mimics the characteristics of real patients and healthy subjects. For the PD detection task, deep learning models such as ResNet, LSTM-FCN, InceptionTime, and CDIL-CNN are used. The experiments were performed using the speech task of sustained vowel /a/ in the PC-GITA database, which contains the recordings of healthy and PD subjects. CDIL-CNN achieves the best results, improving the accuracy by 15.87% (8.96%) compared to the model that does not use augmented data (from the best method found in the literature that uses voice waveforms). The results of this study indicate that models trained with raw waveforms showcase modest but promising performance, underlying the potential of audio analysis to improve the early detection of PD, providing a non-invasive and potentially remotely applicable method.

Más información

ID de Registro: 92509
Identificador DC: https://oa.upm.es/92509/
Identificador OAI: oai:oa.upm.es:92509
URL Portal Científico: https://portalcientifico.upm.es/es/ipublic/item/10313128
Identificador DOI: 10.1109/OJCS.2024.3504864
URL Oficial: https://ieeexplore.ieee.org/document/10764737/
Depositado por: iMarina Portal Científico
Depositado el: 27 Dic 2025 14:57
Ultima Modificación: 27 Dic 2025 14:57