Speech Signals Feature Extraction Model for a Speaker’s Gender and Age Identification System

Muñoz Mulas, Cristina (2014). Speech Signals Feature Extraction Model for a Speaker’s Gender and Age Identification System. Thesis (Doctoral), E.T.S. de Ingenieros Informáticos (UPM).

Description

Title: Speech Signals Feature Extraction Model for a Speaker’s Gender and Age Identification System
Author/s:
  • Muñoz Mulas, Cristina
Contributor/s:
  • Martínez Olalla, Rafael
Item Type: Thesis (Doctoral)
Date: 2014
Subjects:
Freetext Keywords: speech processing, joint-process estimation, speaker’s biometry, contextual speech information, running speech
Faculty: E.T.S. de Ingenieros Informáticos (UPM)
Department: Arquitectura y Tecnología de Sistemas Informáticos
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (14MB) | Preview

Abstract

Durante el proceso de producción de voz, los factores anatómicos, fisiológicos o psicosociales del individuo modifican los órganos resonadores, imprimiendo en la voz características particulares. Los sistemas ASR tratan de encontrar los matices característicos de una voz y asociarlos a un individuo o grupo. La edad y sexo de un hablante son factores intrínsecos que están presentes en la voz. Este trabajo intenta diferenciar esas características, aislarlas y usarlas para detectar el género y la edad de un hablante. Para dicho fin, se ha realizado el estudio y análisis de las características basadas en el pulso glótico y el tracto vocal, evitando usar técnicas clásicas (como pitch y sus derivados) debido a las restricciones propias de dichas técnicas. Los resultados finales de nuestro estudio alcanzan casi un 100% en reconocimiento de género mientras en la tarea de reconocimiento de edad el reconocimiento se encuentra alrededor del 80%. Parece ser que la voz queda afectada por el género del hablante y las hormonas, aunque no se aprecie en la audición. ABSTRACT Particular elements of the voice are printed during the speech production process and are related to anatomical and physiological factors of the phonatory system or psychosocial factors acquired by the speaker. ASR systems attempt to find those peculiar nuances of a voice and associate them to an individual or a group. Age and gender are inherent factors to the speaker which may be represented in voice. This work attempts to differentiate those characteristics, isolate them and use them to detect speaker’s gender and age. Features based on glottal pulse and vocal tract are studied and analyzed in order to achieve good results in both tasks. Classical methodologies (such as pitch and derivates) are avoided since the requirements of those techniques may be too restrictive. The final scores achieve almost 100% in gender recognition whereas in age recognition those scores are around 80%. Factors related to the gender and hormones seem to affect the voice although they are not audible.

More information

Item ID: 33121
DC Identifier: http://oa.upm.es/33121/
OAI Identifier: oai:oa.upm.es:33121
Deposited by: Archivo Digital UPM 2
Deposited on: 15 Dec 2014 08:31
Last Modified: 12 Jun 2015 22:56
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM