Using dysphonic voice to characterize speaker's biometry

Gómez Vilda, Pedro and San Segundo, Eugenia and Mazaira Fernández, Luis Miguel and Álvarez Marquina, Agustín and Rodellar Biarge, M. Victoria (2014). Using dysphonic voice to characterize speaker's biometry. "Language and Law / Linguagem e Direito", v. 1 (n. 2); pp. 42-66. ISSN 2183-3745.

Description

Title: Using dysphonic voice to characterize speaker's biometry
Author/s:
  • Gómez Vilda, Pedro
  • San Segundo, Eugenia
  • Mazaira Fernández, Luis Miguel
  • Álvarez Marquina, Agustín
  • Rodellar Biarge, M. Victoria
Item Type: Article
Título de Revista/Publicación: Language and Law / Linguagem e Direito
Date: 2014
ISSN: 2183-3745
Volume: 1
Subjects:
Freetext Keywords: Phonation; Speaker Recognition; Voice Production; Speech Processing
Faculty: E.T.S. de Ingenieros Informáticos (UPM)
Department: Arquitectura y Tecnología de Sistemas Informáticos
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (1MB) | Preview

Abstract

Phonation distortion leaves relevant marks in a speaker's biometric profile. Dysphonic voice production may be used for biometrical speaker characterization. In the present paper phonation features derived from the glottal source (GS) parameterization, after vocal tract inversion, is proposed for dysphonic voice characterization in Speaker Verification tasks. The glottal source derived parameters are matched in a forensic evaluation framework defining a distance-based metric specification. The phonation segments used in the study are derived from fillers, long vowels, and other phonation segments produced in spontaneous telephone conversations. Phonated segments from a telephonic database of 100 male Spanish native speakers are combined in a 10-fold cross-validation task to produce the set of quality measurements outlined in the paper. Shimmer, mucosal wave correlate, vocal fold cover biomechanical parameter unbalance and a subset of the GS cepstral profile produce accuracy rates as high as 99.57 for a wide threshold interval (62.08-75.04%). An Equal Error Rate of 0.64 % can be granted. The proposed metric framework is shown to behave more fairly than classical likelihood ratios in supporting the hypothesis of the defense vs that of the prosecution, thus ofering a more reliable evaluation scoring. Possible applications are Speaker Verification and Dysphonic Voice Grading.

Funding Projects

TypeCodeAcronymLeaderTitle
Government of SpainTEC2012-38630-C04-01UnspecifiedUniversidad Politécnica de MadridEVALUACION MULTIMODAL DE TRASTORNOS NEUROLOGICOS MEDIANTE LA CARACTERIZACION DE LA VOZ, DINAMICA DE LOS PLIEGUES VOCALES Y SECUENCIAS SACADICAS
Government of SpainTEC2012-38630-C04-04UnspecifiedUniversidad Politécnica de MadridDETECCION DEL TRASTORNO NEUROLOGICO POR MEDIO DE CORRELATOS DE LA FONACION OBTENIDOS POR MODELADO INVERSO A PARTIR DE LA FUENTE GLOTICA

More information

Item ID: 40910
DC Identifier: http://oa.upm.es/40910/
OAI Identifier: oai:oa.upm.es:40910
Official URL: https://ojs.letras.up.pt/index.php/LLLD/article/view/2431
Deposited by: Memoria Investigacion
Deposited on: 26 Oct 2016 11:09
Last Modified: 05 Jun 2019 17:21
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM