Towards glottal source controllability in expressive speech synthesis

Lorenzo Trueba, Jaime; Barra Chicote, Roberto; Raitio, Tuomo; Obin, Nicolas; Alku, Paavo; Yamagishi, J. y Montero Martínez, Juan Manuel (2012). Towards glottal source controllability in expressive speech synthesis. En: "InterSpeech 2012 - 13th Annual Conference of the International Speech Communication Association", 09/09/2012 - 13/09/2012, Portland, Oregon. pp. 1-4.

Descripción

Título: Towards glottal source controllability in expressive speech synthesis
Autor/es:
  • Lorenzo Trueba, Jaime
  • Barra Chicote, Roberto
  • Raitio, Tuomo
  • Obin, Nicolas
  • Alku, Paavo
  • Yamagishi, J.
  • Montero Martínez, Juan Manuel
Tipo de Documento: Ponencia en Congreso o Jornada (Artículo)
Título del Evento: InterSpeech 2012 - 13th Annual Conference of the International Speech Communication Association
Fechas del Evento: 09/09/2012 - 13/09/2012
Lugar del Evento: Portland, Oregon
Título del Libro: InterSpeech 2012 - 13th Annual Conference of the International Speech Communication Association
Fecha: 2012
Materias:
Palabras Clave Informales: Expressive speech synthesis, speaking style, glottal source modeling.
Escuela: E.T.S.I. Telecomunicación (UPM)
Departamento: Ingeniería Electrónica
Licencias Creative Commons: Reconocimiento - Sin obra derivada - No comercial

Texto completo

[img]
Vista Previa
PDF (Document Portable Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (255kB) | Vista Previa

Resumen

In order to obtain more human like sounding humanmachine interfaces we must first be able to give them expressive capabilities in the way of emotional and stylistic features so as to closely adequate them to the intended task. If we want to replicate those features it is not enough to merely replicate the prosodic information of fundamental frequency and speaking rhythm. The proposed additional layer is the modification of the glottal model, for which we make use of the GlottHMM parameters. This paper analyzes the viability of such an approach by verifying that the expressive nuances are captured by the aforementioned features, obtaining 95% recognition rates on styled speaking and 82% on emotional speech. Then we evaluate the effect of speaker bias and recording environment on the source modeling in order to quantify possible problems when analyzing multi-speaker databases. Finally we propose a speaking styles separation for Spanish based on prosodic features and check its perceptual significance.

Más información

ID de Registro: 20409
Identificador DC: http://oa.upm.es/20409/
Identificador OAI: oai:oa.upm.es:20409
Depositado por: Memoria Investigacion
Depositado el: 05 Oct 2013 08:38
Ultima Modificación: 21 Abr 2016 23:12
  • Open Access
  • Open Access
  • Sherpa-Romeo
    Compruebe si la revista anglosajona en la que ha publicado un artículo permite también su publicación en abierto.
  • Dulcinea
    Compruebe si la revista española en la que ha publicado un artículo permite también su publicación en abierto.
  • Recolecta
  • e-ciencia
  • Observatorio I+D+i UPM
  • OpenCourseWare UPM