Phoneme and Sub-Phoneme T-Normalization for Text-Dependent Speaker Recognition

Torre Toledano, Doroteo and Esteve-Elizalde, Cristina and Gonzalez-Rodriguez, Joaquin and Fernández Pozo, Rubén and Hernández Gómez, Luis Alfonso (2008). Phoneme and Sub-Phoneme T-Normalization for Text-Dependent Speaker Recognition. In: "IEEE Odyssey 2008 Workshop on Speaker and Language Recognition", 21/01/2008-24/01/2008, Stellenbosch, Sudáfrica. ISBN 978-0-620-40331-3.

Description

Title: Phoneme and Sub-Phoneme T-Normalization for Text-Dependent Speaker Recognition
Author/s:
  • Torre Toledano, Doroteo
  • Esteve-Elizalde, Cristina
  • Gonzalez-Rodriguez, Joaquin
  • Fernández Pozo, Rubén
  • Hernández Gómez, Luis Alfonso
Item Type: Presentation at Congress or Conference (Article)
Event Title: IEEE Odyssey 2008 Workshop on Speaker and Language Recognition
Event Dates: 21/01/2008-24/01/2008
Event Location: Stellenbosch, Sudáfrica
Title of Book: Proceedings of the IEEE Odyssey 2008 Workshop on Speaker and Language Recognition
Date: 2008
ISBN: 978-0-620-40331-3
Subjects:
Faculty: E.T.S.I. Telecomunicación (UPM)
Department: Señales, Sistemas y Radiocomunicaciones
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[thumbnail of INVE_MEM_2008_59711.pdf]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (285kB) | Preview

Abstract

Test normalization (T-Norm) is a score normalization technique that is regularly and successfully applied in the context of text-independent speaker recognition. It is less frequently applied, however, to text-dependent or textprompted speaker recognition, mainly because its improvement in this context is more modest. In this paper we present a novel way to improve the performance of T-Norm for text-dependent systems. It consists in applying score TNormalization at the phoneme or sub-phoneme level instead of at the sentence level. Experiments on the YOHO corpus show that, while using standard sentence-level T-Norm does not improve equal error rate (EER), phoneme and sub-phoneme level T-Norm produce a relative EER reduction of 18.9% and 20.1% respectively on a state-of-the-art HMM based textdependent speaker recognition system. Results are even better for working points with low false acceptance rates.

More information

Item ID: 4312
DC Identifier: https://oa.upm.es/4312/
OAI Identifier: oai:oa.upm.es:4312
Official URL: http://www.isca-speech.org/archive/odyssey_2008/od...
Deposited by: Memoria Investigacion
Deposited on: 27 Sep 2010 08:32
Last Modified: 20 Apr 2016 13:35
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM