Continuous expressive speaking styles synthesis based on CVSM and MR-HMM

Lorenzo Trueba, Jaime and Barra Chicote, Roberto and Gallardo Antolín, Ascensión and Yamagishi, Junichi and Montero Martínez, Juan Manuel (2016). Continuous expressive speaking styles synthesis based on CVSM and MR-HMM. In: "26th International Conference on Computational Linguistics (COLING 2016)", 11/12/2016 - 16/12/2016, Osaka, Japan. pp. 369-376.

Description

Title: Continuous expressive speaking styles synthesis based on CVSM and MR-HMM
Author/s:
  • Lorenzo Trueba, Jaime
  • Barra Chicote, Roberto
  • Gallardo Antolín, Ascensión
  • Yamagishi, Junichi
  • Montero Martínez, Juan Manuel
Item Type: Presentation at Congress or Conference (Article)
Event Title: 26th International Conference on Computational Linguistics (COLING 2016)
Event Dates: 11/12/2016 - 16/12/2016
Event Location: Osaka, Japan
Title of Book: 26th International Conference on Computational Linguistics (COLING 2016)
Date: 2016
Subjects:
Faculty: E.T.S.I. Telecomunicación (UPM)
Department: Ingeniería Electrónica
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (267kB) | Preview

Abstract

This paper introduces a continuous system capable of automatically producing the most adequate speaking style to synthesize a desired target text. This is done thanks to a joint modeling of the acoustic and lexical parameters of the speaker models by adapting the CVSM projection of the training texts using MR-HMM techniques. As such, we consider that as long as sufficient variety in the training data is available, we should be able to model a continuous lexical space into a continuous acoustic space. The proposed continuous automatic text to speech system was evaluated by means of a perceptual evaluation in order to compare them with traditional approaches to the task. The system proved to be capable of conveying the correct expressiveness (average adequacy of 3.6) with an expressive strength comparable to oracle traditional expressive speech synthesis (average of 3.6) although with a drop in speech quality mainly due to the semi-continuous nature of the data (average quality of 2.9). This means that the proposed system is capable of improving traditional neutral systems without requiring any additional user interaction.

Funding Projects

TypeCodeAcronymLeaderTitle
Government of SpainTEC2014-53390-PUnspecifiedUnspecifiedUnspecified
Universidad Politécnica de MadridSBUPM-QTKTZHBUnspecifiedUnspecifiedUnspecified
FP7287678UnspecifiedUnspecifiedUnspecified

More information

Item ID: 46492
DC Identifier: http://oa.upm.es/46492/
OAI Identifier: oai:oa.upm.es:46492
Official URL: https://www.aclweb.org/anthology/C/C16/C16-1036.pdf
Deposited by: Memoria Investigacion
Deposited on: 06 Jun 2017 15:27
Last Modified: 06 Jun 2017 15:27
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM