Emotion transplantation through adaptation in HMM-based speech synthesis

Lorenzo Trueba, Jaime and Barra Chicote, Roberto and San Segundo Hernández, Rubén and Ferreiros López, Javier and Yamagishi, Junichi and Montero Martínez, Juan Manuel (2015). Emotion transplantation through adaptation in HMM-based speech synthesis. "Computer Speech & Language", v. 34 (n. 1); pp. 292-307. ISSN 0885-2308. https://doi.org/10.1016/j.csl.2015.03.008.

Description

Title: Emotion transplantation through adaptation in HMM-based speech synthesis
Author/s:
  • Lorenzo Trueba, Jaime
  • Barra Chicote, Roberto
  • San Segundo Hernández, Rubén
  • Ferreiros López, Javier
  • Yamagishi, Junichi
  • Montero Martínez, Juan Manuel
Item Type: Article
Título de Revista/Publicación: Computer Speech & Language
Date: November 2015
ISSN: 0885-2308
Volume: 34
Subjects:
Freetext Keywords: Statistical parametric speech synthesis; Expressive speech synthesis; Cascade adaptation; Emotion transplantation
Faculty: E.T.S.I. Telecomunicación (UPM)
Department: Ingeniería Electrónica
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (7MB) | Preview

Abstract

This paper proposes an emotion transplantation method capable of modifying a synthetic speech model through the use of CSMAPLR adaptation in order to incorporate emotional information learned from a different speaker model while maintaining the identity of the original speaker as much as possible. The proposed method relies on learning both emotional and speaker identity information by means of their adaptation function from an average voice model, and combining them into a single cascade transform capable of imbuing the desired emotion into the target speaker. This method is then applied to the task of transplanting four emotions (anger, happiness, sadness and surprise) into 3 male speakers and 3 female speakers and evaluated in a number of perceptual tests. The results of the evaluations show how the perceived naturalness for emotional text significantly favors the use of the proposed transplanted emotional speech synthesis when compared to traditional neutral speech synthesis, evidenced by a big increase in the perceived emotional strength of the synthesized utterances at a slight cost in speech quality. A final evaluation with a robotic laboratory assistant application shows how by using emotional speech we can significantly increase the students’ satisfaction with the dialog system, proving how the proposed emotion transplantation system provides benefits in real applications.

Funding Projects

TypeCodeAcronymLeaderTitle
Government of SpainTIN2011-28169-C05-03TIMPANOUnspecifiedTecnología para la interacción conversacional compleja persona-máquina con aprendizaje dinámico - UPM
Government of SpainDPI2010-21247-C02-02INAPRAUnspecifiedUnspecified
Madrid Regional GovernmentS2009/TIC-1542MA2VICMRUnspecifiedUnspecified
FP7287678SIMPLE4ALLUnspecifiedSpeech synthesis that improves through adaptive learning
Universidad Politécnica de MadridSBUPM-QTKTZHBUnspecifiedUnspecifiedUnspecified

More information

Item ID: 40458
DC Identifier: http://oa.upm.es/40458/
OAI Identifier: oai:oa.upm.es:40458
DOI: 10.1016/j.csl.2015.03.008
Official URL: http://www.sciencedirect.com/science/article/pii/S0885230815000376
Deposited by: Memoria Investigacion
Deposited on: 23 May 2016 16:50
Last Modified: 05 Jun 2019 13:26
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM