Full text
Preview |
PDF
- Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (275kB) | Preview |
Lopez Ludeña, Veronica and San Segundo Hernández, Rubén and Montero Martínez, Juan Manuel and Barra Chicote, Roberto and Lorenzo Trueba, Jaime (2012). Architecture for text normalization using statistical machine translation techniques. In: "VII Jornadas en Tecnología del Habla and III Iberian SLTech", 21/11/2012 - 22/11/2012, Madrid, España. pp. 204-213.
Title: | Architecture for text normalization using statistical machine translation techniques |
---|---|
Author/s: |
|
Item Type: | Presentation at Congress or Conference (Article) |
Event Title: | VII Jornadas en Tecnología del Habla and III Iberian SLTech |
Event Dates: | 21/11/2012 - 22/11/2012 |
Event Location: | Madrid, España |
Title of Book: | Jornadas en Tecnología del Habla and III Iberian SLTech |
Date: | 2012 |
Subjects: | |
Freetext Keywords: | Text normalization, text to speech conversion, language translation, numbers, acronyms, abbreviations. |
Faculty: | E.T.S.I. Telecomunicación (UPM) |
Department: | Ingeniería Electrónica |
Creative Commons Licenses: | Recognition - No derivative works - Non commercial |
Preview |
PDF
- Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (275kB) | Preview |
This paper proposes an architecture, based on statistical machine translation, for developing the text normalization module of a text to speech conversion system. The main target is to generate a language independent text normalization module, based on data and flexible enough to deal with all situa-tions presented in this task. The proposed architecture is composed by three main modules: a tokenizer module for splitting the text input into a token graph (tokenization), a phrase-based translation module (token translation) and a post-processing module for removing some tokens. This paper presents initial exper-iments for numbers and abbreviations. The very good results obtained validate the proposed architecture.
Item ID: | 20353 |
---|---|
DC Identifier: | https://oa.upm.es/20353/ |
OAI Identifier: | oai:oa.upm.es:20353 |
Deposited by: | Memoria Investigacion |
Deposited on: | 02 Oct 2013 16:27 |
Last Modified: | 21 Apr 2016 23:06 |