Detecting acronyms from capital letter sequences in Spanish

San Segundo Hernández, Rubén and Montero Martínez, Juan Manuel and Lopez Ludeña, Veronica and King, Simon (2012). Detecting acronyms from capital letter sequences in Spanish. In: "13th Annual Conference of the International Speech Communication Association (INTERSPEECH 2012)", 09/09/2013 - 13/09/2013, Portland, Oregon. pp. 1-4.

Description

Title: Detecting acronyms from capital letter sequences in Spanish
Author/s:
  • San Segundo Hernández, Rubén
  • Montero Martínez, Juan Manuel
  • Lopez Ludeña, Veronica
  • King, Simon
Item Type: Presentation at Congress or Conference (Article)
Event Title: 13th Annual Conference of the International Speech Communication Association (INTERSPEECH 2012)
Event Dates: 09/09/2013 - 13/09/2013
Event Location: Portland, Oregon
Title of Book: Annual Conference of the International Speech Communication Association (INTERSPEECH 2012)
Date: 2012
Subjects:
Freetext Keywords: Capital letter sequence pronunciation, Speech synthesis, Spelling, Spanish, Acronyms, Abbreviations
Faculty: E.T.S.I. Telecomunicación (UPM)
Department: Ingeniería Electrónica
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (139kB) | Preview

Abstract

This paper presents an automatic strategy to decide how to pronounce a Capital Letter Sequence (CLS) in a Text to Speech system (TTS). If CLS is well known by the TTS, it can be expanded in several words. But when the CLS is unknown, the system has two alternatives: spelling it (abbreviation) or pronouncing it as a new word (acronym). In Spanish, there is a high relationship between letters and phonemes. Because of this, when a CLS is similar to other words in Spanish, there is a high tendency to pronounce it as a standard word. This paper proposes an automatic method for detecting acronyms. Additionaly, this paper analyses the discrimination capability of some features, and several strategies for combining them in order to obtain the best classifier. For the best classifier, the classification error is 8.45%. About the feature analysis, the best features have been the Letter Sequence Perplexity and the Average N-gram order.

More information

Item ID: 20355
DC Identifier: http://oa.upm.es/20355/
OAI Identifier: oai:oa.upm.es:20355
Deposited by: Memoria Investigacion
Deposited on: 02 Oct 2013 16:38
Last Modified: 21 Apr 2016 23:06
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM