Development of a genre-dependent TTS system with cross-speaker speaking-style transplantation

Lorenzo Trueba, Jaime and Echeverry Correa, Julian David and Barra Chicote, Roberto and San Segundo Hernández, Rubén and Ferreiros López, Javier and Gallardo Antolín, Ascensión and Yamagishi, Junichi and King, Simon and Montero Martínez, Juan Manuel (2014). Development of a genre-dependent TTS system with cross-speaker speaking-style transplantation. In: "Proceedings of the 2nd International Workshop on Speech, Language and Audio in Multimedia (SLAM2014)", 11/09/2014 - 12/09/2014, Penang, Malaysia. pp. 39-42.

Description

Title: Development of a genre-dependent TTS system with cross-speaker speaking-style transplantation
Author/s:
  • Lorenzo Trueba, Jaime
  • Echeverry Correa, Julian David
  • Barra Chicote, Roberto
  • San Segundo Hernández, Rubén
  • Ferreiros López, Javier
  • Gallardo Antolín, Ascensión
  • Yamagishi, Junichi
  • King, Simon
  • Montero Martínez, Juan Manuel
Item Type: Presentation at Congress or Conference (Article)
Event Title: Proceedings of the 2nd International Workshop on Speech, Language and Audio in Multimedia (SLAM2014)
Event Dates: 11/09/2014 - 12/09/2014
Event Location: Penang, Malaysia
Title of Book: 2nd International Workshop on Speech, Language and Audio in Multimedia (SLAM2014)
Date: 2014
Subjects:
Freetext Keywords: Speech synthesis, speaking style transplantation, automatic genre identification, Latent Semantic Analysis
Faculty: E.T.S.I. Telecomunicación (UPM)
Department: Ingeniería Electrónica
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (107kB) | Preview

Abstract

One of the biggest challenges in speech synthesis is the production of contextually-appropriate naturally sounding synthetic voices. This means that a Text-To-Speech system must be able to analyze a text beyond the sentence limits in order to select, or even modulate, the speaking style according to a broader context. Our current architecture is based on a two-step approach: text genre identification and speaking style synthesis according to the detected discourse genre. For the final implementation, a set of four genres and their corresponding speaking styles were considered: broadcast news, live sport commentaries, interviews and political speeches. In the final TTS evaluation, the four speaking styles were transplanted to the neutral voices of other speakers not included in the training database. When the transplanted styles were compared to the neutral voices, transplantation was significantly preferred and the similarity to the target speaker was as high as 78%.

Funding Projects

TypeCodeAcronymLeaderTitle
Government of SpainTIN2011-28169-C05-03UnspecifiedUnspecifiedUnspecified
Government of SpainDPI2010-21247-C02-02UnspecifiedUnspecifiedUnspecified

More information

Item ID: 37521
DC Identifier: http://oa.upm.es/37521/
OAI Identifier: oai:oa.upm.es:37521
Deposited by: Memoria Investigacion
Deposited on: 14 Oct 2015 17:05
Last Modified: 06 Jun 2016 17:05
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM