Language recognition using phonotactic-based shifted delta coefficients and multiple phone recognizers

D'Haro Enriquez, Luis Fernando and Cordoba Herralde, Ricardo de and Salamea Palacios, Christian Raúl and Ferreiros López, Javier (2014). Language recognition using phonotactic-based shifted delta coefficients and multiple phone recognizers. In: "15th Annual Conference of the Internacional Speech Communication Association (Interspeech 2014)", 14/09/2014 - 18/09/2014, Singapore. pp. 3042-3046.

Description

Title: Language recognition using phonotactic-based shifted delta coefficients and multiple phone recognizers
Author/s:
  • D'Haro Enriquez, Luis Fernando
  • Cordoba Herralde, Ricardo de
  • Salamea Palacios, Christian Raúl
  • Ferreiros López, Javier
Item Type: Presentation at Congress or Conference (Article)
Event Title: 15th Annual Conference of the Internacional Speech Communication Association (Interspeech 2014)
Event Dates: 14/09/2014 - 18/09/2014
Event Location: Singapore
Title of Book: 15th Annual Conference of the Internacional Speech Communication Association (Interspeech 2014)
Date: 2014
Subjects:
Freetext Keywords: Language recognition, SDC, Phone-Log Likelihood Ratios, parallel phone recognizers
Faculty: E.T.S.I. Telecomunicación (UPM)
Department: Ingeniería Electrónica
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (1MB) | Preview

Abstract

A new language recognition technique based on the application of the philosophy of the Shifted Delta Coefficients (SDC) to phone log-likelihood ratio features (PLLR) is described. The new methodology allows the incorporation of long-span phonetic information at a frame-by-frame level while dealing with the temporal length of each phone unit. The proposed features are used to train an i-vector based system and tested on the Albayzin LRE 2012 dataset. The results show a relative improvement of 33.3% in Cavg in comparison with different state-of-the-art acoustic i-vector based systems. On the other hand, the integration of parallel phone ASR systems where each one is used to generate multiple PLLR coefficients which are stacked together and then projected into a reduced dimension are also presented. Finally, the paper shows how the incorporation of state information from the phone ASR contributes to provide additional improvements and how the fusion with the other acoustic and phonotactic systems provides an important improvement of 25.8% over the system presented during the competition.

Funding Projects

TypeCodeAcronymLeaderTitle
Government of SpainTIN2011-28169-C05-03UnspecifiedUnspecifiedUnspecified
Government of SpainDPI2010-21247-C02-02UnspecifiedUnspecifiedUnspecified
Madrid Regional GovernmentS2009/TIC-1542UnspecifiedUnspecifiedUnspecified
FP7ICT-2011-7 287678SIMPLE4ALLUniversity of EdinburghSpeech synthesis that improves through adaptive learning

More information

Item ID: 37546
DC Identifier: http://oa.upm.es/37546/
OAI Identifier: oai:oa.upm.es:37546
Deposited by: Memoria Investigacion
Deposited on: 08 Sep 2015 11:04
Last Modified: 06 Jun 2016 16:11
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM