Full text
Preview |
PDF
- Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (147kB) | Preview |
Salamea Palacios, Christian Raúl, D'Haro Enríquez, Luis Fernando ORCID: https://orcid.org/0000-0002-3411-7384, Córdoba Herralde, Ricardo de
ORCID: https://orcid.org/0000-0002-7136-9636 and San Segundo Hernández, Rubén
ORCID: https://orcid.org/0000-0001-9659-5464
(2016).
On the use of phone-gram units in recurrent neural networks for language identification.
In: "Odyssey 2016: The Speaker and Language Recognition Workshop", 21/06/2016 - 24/06/2016, Bilbao - España. pp. 117-123.
https://doi.org/10.21437/Odyssey.2016-17.
Title: | On the use of phone-gram units in recurrent neural networks for language identification |
---|---|
Author/s: |
|
Item Type: | Presentation at Congress or Conference (Article) |
Event Title: | Odyssey 2016: The Speaker and Language Recognition Workshop |
Event Dates: | 21/06/2016 - 24/06/2016 |
Event Location: | Bilbao - España |
Title of Book: | Proceedings of The Speaker and Language Recognition Workshop, Odyssey 2016 |
Date: | June 2016 |
Subjects: | |
Faculty: | E.T.S.I. Telecomunicación (UPM) |
Department: | Ingeniería Electrónica |
Creative Commons Licenses: | Recognition - No derivative works - Non commercial |
Preview |
PDF
- Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (147kB) | Preview |
In this paper we present our results on using RNN-based LM scores trained on different phone-gram orders and using different phonetic ASR recognizers. In order to avoid data sparseness problems and to reduce the vocabulary of all possible n-gram combinations, a K-means clustering procedure was performed using phone-vector embeddings as a pre-processing step. Additional experiments to optimize the amount of classes, batch-size, hidden neurons, state-unfolding, are also presented. We have worked with the KALAKA-3 database for the plenty-closed condition [1]. Thanks to our clustering technique and the combination of high level phonegrams, our phonotactic system performs ~13% better than the unigram-based RNNLM system. Also, the obtained RNNLM scores are calibrated and fused with other scores from an acoustic-based i-vector system and a traditional PPRLM system. This fusion provides additional improvements showing that they provide complementary information to the LID system.
Item ID: | 47224 |
---|---|
DC Identifier: | https://oa.upm.es/47224/ |
OAI Identifier: | oai:oa.upm.es:47224 |
DOI: | 10.21437/Odyssey.2016-17 |
Official URL: | http://www.odyssey2016.org/papers/pdfs_stamped/53.... |
Deposited by: | Memoria Investigacion |
Deposited on: | 24 Oct 2017 16:20 |
Last Modified: | 25 Mar 2023 10:28 |