On the design of automatic voice condition analysis systems. Part II: review of speaker recognition techniques and study on the effects of different variability factors

Gómez García, Jorge Andrés and Moro Velázquez, Laureano and Godino Llorente, Juan Ignacio (2019). On the design of automatic voice condition analysis systems. Part II: review of speaker recognition techniques and study on the effects of different variability factors. "Biomedical Signal Processing and Control", v. 48 ; pp. 128-143. ISSN 1746-8094. https://doi.org/10.1016/j.bspc.2018.09.003.

Description

Title: On the design of automatic voice condition analysis systems. Part II: review of speaker recognition techniques and study on the effects of different variability factors
Author/s:
  • Gómez García, Jorge Andrés
  • Moro Velázquez, Laureano
  • Godino Llorente, Juan Ignacio
Item Type: Article
Título de Revista/Publicación: Biomedical Signal Processing and Control
Date: March 2019
ISSN: 1746-8094
Volume: 48
Subjects:
Freetext Keywords: Robust automatic voice condition analysis; Universal background models; Extralinguistic aspects of the speech; Cross-dataset validation
Faculty: E.T.S.I. Telecomunicación (UPM)
Department: Otro
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (8MB) | Preview

Abstract

This is the second of a two-part series devoted to the automatic voice condition analysis of voice pathologies, being a direct continuation to the paper On the design of automatic voice condition analysis systems. Part I: review of concepts and an insight to the state of the art. The aim of this study is to examine several variability factors affecting the robustness of systems that automatically detect the presence of voice pathologies by means of audio registers. Multiple experiments are performed to test out the influence of the speech task, extralinguistic aspects (such as sex), the acoustic features and the classifiers in their performance. Some experiments are carried out using state-of-the-art classification methodologies often employed in speaker recognition. In order to evaluate the robustness of the methods, testing is repeated across several corpora with the aim to create a single system integrating the conclusions obtained previously. This system is later tested under cross-dataset scenarios in an attempt to obtain more realistic conclusions. Results identify a reduced subset of relevant features, which are used in a hierarchical-like scenario incorporating information of different speech tasks. In particular, for the experiments carried out using the Saarbrüecken voice dataset, the area under the ROC curve of the system reached 0.88 in an intra-dataset setting and ranged from 0.82 to 0.94 in cross-dataset scenarios. These results let us open a discussion about the suitability of these techniques to be transferred to the clinical setting.

Funding Projects

TypeCodeAcronymLeaderTitle
Government of SpainDPI2017-83405-R1UnspecifiedUnspecifiedUnspecified

More information

Item ID: 64431
DC Identifier: http://oa.upm.es/64431/
OAI Identifier: oai:oa.upm.es:64431
DOI: 10.1016/j.bspc.2018.09.003
Official URL: https://www.sciencedirect.com/science/article/pii/S1746809418302416
Deposited by: Memoria Investigacion
Deposited on: 19 Dec 2020 10:23
Last Modified: 08 Mar 2021 23:30
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM