Towards speaking style transplantation in speech synthesis

Lorenzo Trueba, Jaime; Barra Chicote, Roberto; Yamagishi, J.; Watts, Oliver y Montero Martínez, Juan Manuel (2013). Towards speaking style transplantation in speech synthesis. En: "8th ISCA Speech Synthesis Workshop", 31/08/2013 - 02/09/2013, Barcelona, Spain. pp. 159-163.


Título: Towards speaking style transplantation in speech synthesis
  • Lorenzo Trueba, Jaime
  • Barra Chicote, Roberto
  • Yamagishi, J.
  • Watts, Oliver
  • Montero Martínez, Juan Manuel
Tipo de Documento: Ponencia en Congreso o Jornada (Artículo)
Título del Evento: 8th ISCA Speech Synthesis Workshop
Fechas del Evento: 31/08/2013 - 02/09/2013
Lugar del Evento: Barcelona, Spain
Título del Libro: 8th ISCA Speech Synthesis Workshop
Fecha: 2013
Palabras Clave Informales: Expressive speech synthesis, speaking styles, adaptation, expressiveness transplantation
Escuela: E.T.S.I. Telecomunicación (UPM)
Departamento: Ingeniería Electrónica
Licencias Creative Commons: Reconocimiento - Sin obra derivada - No comercial

Texto completo

Vista Previa
PDF (Document Portable Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (576kB) | Vista Previa


One of the biggest challenges in speech synthesis is the production of naturally sounding synthetic voices. This means that the resulting voice must be not only of high enough quality but also that it must be able to capture the natural expressiveness imbued in human speech. This paper focus on solving the expressiveness problem by proposing a set of different techniques that could be used for extrapolating the expressiveness of proven high quality speaking style models into neutral speakers in HMM-based synthesis. As an additional advantage, the proposed techniques are based on adaptation approaches, which means that they can be used with little training data (around 15 minutes of training data are used in each style for this paper). For the final implementation, a set of 4 speaking styles were considered: news broadcasts, live sports commentary, interviews and parliamentary speech. Finally, the implementation of the 5 techniques were tested through a perceptual evaluation that proves that the deviations between neutral and speaking style average models can be learned and used to imbue expressiveness into target neutral speakers as intended.

Más información

ID de Registro: 30098
Identificador DC:
Identificador OAI:
Depositado por: Memoria Investigacion
Depositado el: 27 Jul 2014 08:02
Ultima Modificación: 22 Abr 2016 00:24
  • GEO_UP4
  • Open Access
  • Open Access
  • Sherpa-Romeo
    Compruebe si la revista anglosajona en la que ha publicado un artículo permite también su publicación en abierto.
  • Dulcinea
    Compruebe si la revista española en la que ha publicado un artículo permite también su publicación en abierto.
  • Recolecta
  • InvestigaM
  • Observatorio I+D+i UPM
  • OpenCourseWare UPM