Continuous expressive speaking styles synthesis based on CVSM and MR-HMM

Lorenzo Trueba, Jaime; Barra Chicote, Roberto; Gallardo Antolín, Ascensión; Yamagishi, Junichi y Montero Martínez, Juan Manuel (2016). Continuous expressive speaking styles synthesis based on CVSM and MR-HMM. En: "26th International Conference on Computational Linguistics (COLING 2016)", 11/12/2016 - 16/12/2016, Osaka, Japan. pp. 369-376.

Descripción

Título: Continuous expressive speaking styles synthesis based on CVSM and MR-HMM
Autor/es:
  • Lorenzo Trueba, Jaime
  • Barra Chicote, Roberto
  • Gallardo Antolín, Ascensión
  • Yamagishi, Junichi
  • Montero Martínez, Juan Manuel
Tipo de Documento: Ponencia en Congreso o Jornada (Artículo)
Título del Evento: 26th International Conference on Computational Linguistics (COLING 2016)
Fechas del Evento: 11/12/2016 - 16/12/2016
Lugar del Evento: Osaka, Japan
Título del Libro: 26th International Conference on Computational Linguistics (COLING 2016)
Fecha: 2016
Materias:
Escuela: E.T.S.I. Telecomunicación (UPM)
Departamento: Ingeniería Electrónica
Licencias Creative Commons: Reconocimiento - Sin obra derivada - No comercial

Texto completo

[img]
Vista Previa
PDF (Document Portable Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (267kB) | Vista Previa

Resumen

This paper introduces a continuous system capable of automatically producing the most adequate speaking style to synthesize a desired target text. This is done thanks to a joint modeling of the acoustic and lexical parameters of the speaker models by adapting the CVSM projection of the training texts using MR-HMM techniques. As such, we consider that as long as sufficient variety in the training data is available, we should be able to model a continuous lexical space into a continuous acoustic space. The proposed continuous automatic text to speech system was evaluated by means of a perceptual evaluation in order to compare them with traditional approaches to the task. The system proved to be capable of conveying the correct expressiveness (average adequacy of 3.6) with an expressive strength comparable to oracle traditional expressive speech synthesis (average of 3.6) although with a drop in speech quality mainly due to the semi-continuous nature of the data (average quality of 2.9). This means that the proposed system is capable of improving traditional neutral systems without requiring any additional user interaction.

Proyectos asociados

TipoCódigoAcrónimoResponsableTítulo
Gobierno de EspañaTEC2014-53390-PSin especificarSin especificarSin especificar
Universidad Politécnica de MadridSBUPM-QTKTZHBSin especificarSin especificarSin especificar
FP7287678Sin especificarSin especificarSin especificar

Más información

ID de Registro: 46492
Identificador DC: http://oa.upm.es/46492/
Identificador OAI: oai:oa.upm.es:46492
URL Oficial: https://www.aclweb.org/anthology/C/C16/C16-1036.pdf
Depositado por: Memoria Investigacion
Depositado el: 06 Jun 2017 15:27
Ultima Modificación: 06 Jun 2017 15:27
  • Open Access
  • Open Access
  • Sherpa-Romeo
    Compruebe si la revista anglosajona en la que ha publicado un artículo permite también su publicación en abierto.
  • Dulcinea
    Compruebe si la revista española en la que ha publicado un artículo permite también su publicación en abierto.
  • Recolecta
  • e-ciencia
  • Observatorio I+D+i UPM
  • OpenCourseWare UPM