Language recognition using phonotactic-based shifted delta coefficients and multiple phone recognizers

D'Haro Enriquez, Luis Fernando; Cordoba Herralde, Ricardo de; Salamea Palacios, Christian Raúl y Ferreiros López, Javier (2014). Language recognition using phonotactic-based shifted delta coefficients and multiple phone recognizers. En: "15th Annual Conference of the Internacional Speech Communication Association (Interspeech 2014)", 14/09/2014 - 18/09/2014, Singapore. pp. 3042-3046.

Descripción

Título: Language recognition using phonotactic-based shifted delta coefficients and multiple phone recognizers
Autor/es:
  • D'Haro Enriquez, Luis Fernando
  • Cordoba Herralde, Ricardo de
  • Salamea Palacios, Christian Raúl
  • Ferreiros López, Javier
Tipo de Documento: Ponencia en Congreso o Jornada (Artículo)
Título del Evento: 15th Annual Conference of the Internacional Speech Communication Association (Interspeech 2014)
Fechas del Evento: 14/09/2014 - 18/09/2014
Lugar del Evento: Singapore
Título del Libro: 15th Annual Conference of the Internacional Speech Communication Association (Interspeech 2014)
Fecha: 2014
Materias:
Palabras Clave Informales: Language recognition, SDC, Phone-Log Likelihood Ratios, parallel phone recognizers
Escuela: E.T.S.I. Telecomunicación (UPM)
Departamento: Ingeniería Electrónica
Licencias Creative Commons: Reconocimiento - Sin obra derivada - No comercial

Texto completo

[img]
Vista Previa
PDF (Document Portable Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (1MB) | Vista Previa

Resumen

A new language recognition technique based on the application of the philosophy of the Shifted Delta Coefficients (SDC) to phone log-likelihood ratio features (PLLR) is described. The new methodology allows the incorporation of long-span phonetic information at a frame-by-frame level while dealing with the temporal length of each phone unit. The proposed features are used to train an i-vector based system and tested on the Albayzin LRE 2012 dataset. The results show a relative improvement of 33.3% in Cavg in comparison with different state-of-the-art acoustic i-vector based systems. On the other hand, the integration of parallel phone ASR systems where each one is used to generate multiple PLLR coefficients which are stacked together and then projected into a reduced dimension are also presented. Finally, the paper shows how the incorporation of state information from the phone ASR contributes to provide additional improvements and how the fusion with the other acoustic and phonotactic systems provides an important improvement of 25.8% over the system presented during the competition.

Proyectos asociados

TipoCódigoAcrónimoResponsableTítulo
Gobierno de EspañaTIN2011-28169-C05-03Sin especificarSin especificarSin especificar
Gobierno de EspañaDPI2010-21247-C02-02Sin especificarSin especificarSin especificar
Comunidad de MadridS2009/TIC-1542Sin especificarSin especificarSin especificar
FP7ICT-2011-7 287678SIMPLE4ALLUniversity of EdinburghSpeech synthesis that improves through adaptive learning

Más información

ID de Registro: 37546
Identificador DC: http://oa.upm.es/37546/
Identificador OAI: oai:oa.upm.es:37546
Depositado por: Memoria Investigacion
Depositado el: 08 Sep 2015 11:04
Ultima Modificación: 06 Jun 2016 16:11
  • Open Access
  • Open Access
  • Sherpa-Romeo
    Compruebe si la revista anglosajona en la que ha publicado un artículo permite también su publicación en abierto.
  • Dulcinea
    Compruebe si la revista española en la que ha publicado un artículo permite también su publicación en abierto.
  • Recolecta
  • e-ciencia
  • Observatorio I+D+i UPM
  • OpenCourseWare UPM