Towards glottal source controllability in expressive speech synthesis

Lorenzo Trueba, Jaime and Barra Chicote, Roberto and Raitio, Tuomo and Obin, Nicolas and Alku, Paavo and Yamagishi, J. and Montero Martínez, Juan Manuel (2012). Towards glottal source controllability in expressive speech synthesis. In: "InterSpeech 2012 - 13th Annual Conference of the International Speech Communication Association", 09/09/2012 - 13/09/2012, Portland, Oregon. pp. 1-4.

Description

Title: Towards glottal source controllability in expressive speech synthesis
Author/s:
  • Lorenzo Trueba, Jaime
  • Barra Chicote, Roberto
  • Raitio, Tuomo
  • Obin, Nicolas
  • Alku, Paavo
  • Yamagishi, J.
  • Montero Martínez, Juan Manuel
Item Type: Presentation at Congress or Conference (Article)
Event Title: InterSpeech 2012 - 13th Annual Conference of the International Speech Communication Association
Event Dates: 09/09/2012 - 13/09/2012
Event Location: Portland, Oregon
Title of Book: InterSpeech 2012 - 13th Annual Conference of the International Speech Communication Association
Date: 2012
Subjects:
Freetext Keywords: Expressive speech synthesis, speaking style, glottal source modeling.
Faculty: E.T.S.I. Telecomunicación (UPM)
Department: Ingeniería Electrónica
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[thumbnail of INVE_MEM_2012_134451.pdf]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (255kB) | Preview

Abstract

In order to obtain more human like sounding humanmachine interfaces we must first be able to give them expressive capabilities in the way of emotional and stylistic features so as to closely adequate them to the intended task. If we want to replicate those features it is not enough to merely replicate the prosodic information of fundamental frequency and speaking rhythm. The proposed additional layer is the modification of the glottal model, for which we make use of the GlottHMM parameters. This paper analyzes the viability of such an approach by verifying that the expressive nuances are captured by the aforementioned features, obtaining 95% recognition rates on styled speaking and 82% on emotional speech. Then we evaluate the effect of speaker bias and recording environment on the source modeling in order to quantify possible problems when analyzing multi-speaker databases. Finally we propose a speaking styles separation for Spanish based on prosodic features and check its perceptual significance.

More information

Item ID: 20409
DC Identifier: https://oa.upm.es/20409/
OAI Identifier: oai:oa.upm.es:20409
Deposited by: Memoria Investigacion
Deposited on: 05 Oct 2013 08:38
Last Modified: 21 Apr 2016 23:12
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM