Architecture for text normalization using statistical machine translation techniques

Lopez Ludeña, Veronica and San Segundo Hernández, Rubén and Montero Martínez, Juan Manuel and Barra Chicote, Roberto and Lorenzo Trueba, Jaime (2012). Architecture for text normalization using statistical machine translation techniques. In: "VII Jornadas en Tecnología del Habla and III Iberian SLTech", 21/11/2012 - 22/11/2012, Madrid, España. pp. 204-213.

Description

Title: Architecture for text normalization using statistical machine translation techniques
Author/s:
  • Lopez Ludeña, Veronica
  • San Segundo Hernández, Rubén
  • Montero Martínez, Juan Manuel
  • Barra Chicote, Roberto
  • Lorenzo Trueba, Jaime
Item Type: Presentation at Congress or Conference (Article)
Event Title: VII Jornadas en Tecnología del Habla and III Iberian SLTech
Event Dates: 21/11/2012 - 22/11/2012
Event Location: Madrid, España
Title of Book: Jornadas en Tecnología del Habla and III Iberian SLTech
Date: 2012
Subjects:
Freetext Keywords: Text normalization, text to speech conversion, language translation, numbers, acronyms, abbreviations.
Faculty: E.T.S.I. Telecomunicación (UPM)
Department: Ingeniería Electrónica
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[thumbnail of INVE_MEM_2012_133658.pdf]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (275kB) | Preview

Abstract

This paper proposes an architecture, based on statistical machine translation, for developing the text normalization module of a text to speech conversion system. The main target is to generate a language independent text normalization module, based on data and flexible enough to deal with all situa-tions presented in this task. The proposed architecture is composed by three main modules: a tokenizer module for splitting the text input into a token graph (tokenization), a phrase-based translation module (token translation) and a post-processing module for removing some tokens. This paper presents initial exper-iments for numbers and abbreviations. The very good results obtained validate the proposed architecture.

More information

Item ID: 20353
DC Identifier: https://oa.upm.es/20353/
OAI Identifier: oai:oa.upm.es:20353
Deposited by: Memoria Investigacion
Deposited on: 02 Oct 2013 16:27
Last Modified: 21 Apr 2016 23:06
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM