Attention-based word vector prediction with LSTMs and its application to the OOV problem in ASR

Coucheiro Limeres, Alejandro ORCID: https://orcid.org/0000-0003-2966-2035, Fernández Martínez, Fernando ORCID: https://orcid.org/0000-0003-3877-0089, San Segundo Hernández, Rubén ORCID: https://orcid.org/0000-0001-9659-5464 and Ferreiros López, Javier ORCID: https://orcid.org/0000-0001-8834-3080 (2019). Attention-based word vector prediction with LSTMs and its application to the OOV problem in ASR. In: "20th Annual Conference of the International Speech Communication Association (Interspeech 2019)", 15/09/2019 – 19/09/2019, Graz, Austria. pp. 3520-3523. https://doi.org/10.21437/Interspeech.2019-2347.

Description

Title: Attention-based word vector prediction with LSTMs and its application to the OOV problem in ASR
Author/s:
Item Type: Presentation at Congress or Conference (Article)
Event Title: 20th Annual Conference of the International Speech Communication Association (Interspeech 2019)
Event Dates: 15/09/2019 – 19/09/2019
Event Location: Graz, Austria
Title of Book: 20th Annual Conference of the International Speech Communication Association (Interspeech 2019)
Date: 2019
Subjects:
Freetext Keywords: Word vector prediction; speech recognition; out-of-vocabulary terms; curricular learning; neural network; language model
Faculty: E.T.S.I. Telecomunicación (UPM)
Department: Ingeniería Electrónica
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[thumbnail of INVE_MEM_2019_326664.pdf]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (317kB) | Preview

Abstract

We propose three architectures for a word vector prediction system (WVPS) built with LSTMs that consider both past and future contexts of a word for predicting a vector in an embedded space where its surrounding area is semantically related to the considered word. We introduce an attention mechanism in one of the architectures so the system is able to assess the specific contribution of each context word to the prediction. All the architectures are trained under the same conditions and the same training material, following a curricular-learning fashion in the presentation of the data. For the inputs, we employ pretrained word embeddings. We evaluate the systems after the same number of training steps, over two different corpora composed of ground-truth speech transcriptions in Spanish language from TCSTAR and TV recordings used in the Search on Speech Challenge of IberSPEECH 2018. The results show that we are able to reach significant differences between the architectures, consistently across both corpora. The attention-based architecture achieves the best results, suggesting its adequacy for the task. Also, we illustrate the usefulness of the systems for resolving out-of-vocabulary (OOV) regions marked by an ASR system capable of detecting OOV occurrences

Funding Projects

Type
Code
Acronym
Leader
Title
Government of Spain
TIN2017-85854-C4
Unspecified
Unspecified
Unspecified

More information

Item ID: 65331
DC Identifier: https://oa.upm.es/65331/
OAI Identifier: oai:oa.upm.es:65331
DOI: 10.21437/Interspeech.2019-2347
Official URL: https://www.isca-speech.org/archive/Interspeech_20...
Deposited by: Memoria Investigacion
Deposited on: 17 Apr 2021 07:39
Last Modified: 17 Apr 2021 07:39
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM