Automatic Categorization for Improving Spanish into Spanish Sign Language Machine Translation

Lopez Ludeña, Veronica; San Segundo Hernández, Rubén; Montero Martínez, Juan Manuel; Córdoba Herralde, Ricardo de; Ferreiros López, Javier y Pardo Muñoz, José Manuel (2011). Automatic Categorization for Improving Spanish into Spanish Sign Language Machine Translation. "Computer Speech & Language", v. 26 (n. 3); pp. 149-167. ISSN 0885-2308. https://doi.org/10.1016/j.csl.2011.09.003.

Descripción

Título: Automatic Categorization for Improving Spanish into Spanish Sign Language Machine Translation
Autor/es:
  • Lopez Ludeña, Veronica
  • San Segundo Hernández, Rubén
  • Montero Martínez, Juan Manuel
  • Córdoba Herralde, Ricardo de
  • Ferreiros López, Javier
  • Pardo Muñoz, José Manuel
Tipo de Documento: Artículo
Título de Revista/Publicación: Computer Speech & Language
Fecha: 2011
Volumen: 26
Materias:
Palabras Clave Informales: SpanishSignLanguage (LSE); Statistical languagetranslation; Syntactic-semantic information; Automatic tagging; Automaticcategorization
Escuela: E.T.S.I. Telecomunicación (UPM)
Departamento: Ingeniería Electrónica
Licencias Creative Commons: Reconocimiento - Sin obra derivada - No comercial

Texto completo

[img]
Vista Previa
PDF (Document Portable Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (255kB) | Vista Previa

Resumen

This paper describes a preprocessing module for improving the performance of a Spanish into Spanish Sign Language (Lengua de Signos Espanola: LSE) translation system when dealing with sparse training data. This preprocessing module replaces Spanish words with associated tags. The list with Spanish words (vocabulary) and associated tags used by this module is computed automatically considering those signs that show the highest probability of being the translation of every Spanish word. This automatic tag extraction has been compared to a manual strategy achieving almost the same improvement. In this analysis, several alternatives for dealing with non-relevant words have been studied. Non-relevant words are Spanish words not assigned to any sign. The preprocessing module has been incorporated into two well-known statistical translation architectures: a phrase-based system and a Statistical Finite State Transducer (SFST). This system has been developed for a specific application domain: the renewal of Identity Documents and Driver's License. In order to evaluate the system a parallel corpus made up of 4080 Spanish sentences and their LSE translation has been used. The evaluation results revealed a significant performance improvement when including this preprocessing module. In the phrase-based system, the proposed module has given rise to an increase in BLEU (Bilingual Evaluation Understudy) from 73.8% to 81.0% and an increase in the human evaluation score from 0.64 to 0.83. In the case of SFST, BLEU increased from 70.6% to 78.4% and the human evaluation score from 0.65 to 0.82.

Más información

ID de Registro: 11880
Identificador DC: http://oa.upm.es/11880/
Identificador OAI: oai:oa.upm.es:11880
Identificador DOI: 10.1016/j.csl.2011.09.003
URL Oficial: http://www.sciencedirect.com/science/article/pii/S0885230811000489
Depositado por: Memoria Investigacion
Depositado el: 11 Oct 2012 08:00
Ultima Modificación: 21 Abr 2016 11:04
  • Open Access
  • Open Access
  • Sherpa-Romeo
    Compruebe si la revista anglosajona en la que ha publicado un artículo permite también su publicación en abierto.
  • Dulcinea
    Compruebe si la revista española en la que ha publicado un artículo permite también su publicación en abierto.
  • Recolecta
  • e-ciencia
  • Observatorio I+D+i UPM
  • OpenCourseWare UPM