Extended phone log-likelihood ratio features and acoustic-based I-vectors for language recognition

D'haro Enríquez, Luis Fernando and Cordoba Herralde, Ricardo de and Salamea Palacios, Christian Raúl and Echeverry Correa, Julian David (2014). Extended phone log-likelihood ratio features and acoustic-based I-vectors for language recognition. In: "International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2014)", 04/05/2014 - 09/05/2014, Florence, Italy. pp. 5342-5346.

Description

Title: Extended phone log-likelihood ratio features and acoustic-based I-vectors for language recognition
Author/s:
  • D'haro Enríquez, Luis Fernando
  • Cordoba Herralde, Ricardo de
  • Salamea Palacios, Christian Raúl
  • Echeverry Correa, Julian David
Item Type: Presentation at Congress or Conference (Article)
Event Title: International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2014)
Event Dates: 04/05/2014 - 09/05/2014
Event Location: Florence, Italy
Title of Book: International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2014)
Date: 2014
Subjects:
Freetext Keywords: Phone Log-Likelihood Ratios, SDC, dimensionality reduction
Faculty: E.T.S.I. Telecomunicación (UPM)
Department: Ingeniería Electrónica
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (2MB) | Preview

Abstract

This paper presents new techniques with relevant improvements added to the primary system presented by our group to the Albayzin 2012 LRE competition, where the use of any additional corpora for training or optimizing the models was forbidden. In this work, we present the incorporation of an additional phonotactic subsystem based on the use of phone log-likelihood ratio features (PLLR) extracted from different phonotactic recognizers that contributes to improve the accuracy of the system in a 21.4% in terms of Cavg (we also present results for the official metric during the evaluation, Fact). We will present how using these features at the phone state level provides significant improvements, when used together with dimensionality reduction techniques, especially PCA. We have also experimented with applying alternative SDC-like configurations on these PLLR features with additional improvements. Also, we will describe some modifications to the MFCC-based acoustic i-vector system which have also contributed to additional improvements. The final fused system outperformed the baseline in 27.4% in Cavg.

Funding Projects

TypeCodeAcronymLeaderTitle
Madrid Regional GovernmentS2009/TIC-1542UnspecifiedUnspecifiedUnspecified
Government of SpainDPI2010-21247-C02-02UnspecifiedUnspecifiedUnspecified
Government of SpainTIN2011-28169-C05-03UnspecifiedUnspecifiedUnspecified
FP7287678SIMPLE4ALLUniversity of EdinburghSpeech synthesis that improves through adaptive learning

More information

Item ID: 37538
DC Identifier: http://oa.upm.es/37538/
OAI Identifier: oai:oa.upm.es:37538
Official URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6854623
Deposited by: Memoria Investigacion
Deposited on: 21 Sep 2015 16:36
Last Modified: 21 Sep 2015 16:36
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM