Texto completo
Vista Previa |
PDF (Portable Document Format)
- Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (107kB) | Vista Previa |
ORCID: https://orcid.org/0000-0001-9659-5464, Ferreiros López, Javier
ORCID: https://orcid.org/0000-0001-8834-3080, Gallardo Antolín, Ascensión, Yamagishi, Junichi, King, Simon and Montero Martínez, Juan Manuel
ORCID: https://orcid.org/0000-0002-7908-5400
(2014).
Development of a genre-dependent TTS system with cross-speaker speaking-style transplantation.
En: "Proceedings of the 2nd International Workshop on Speech, Language and Audio in Multimedia (SLAM2014)", 11/09/2014 - 12/09/2014, Penang, Malaysia. pp. 39-42.
| Título: | Development of a genre-dependent TTS system with cross-speaker speaking-style transplantation |
|---|---|
| Autor/es: |
|
| Tipo de Documento: | Ponencia en Congreso o Jornada (Artículo) |
| Título del Evento: | Proceedings of the 2nd International Workshop on Speech, Language and Audio in Multimedia (SLAM2014) |
| Fechas del Evento: | 11/09/2014 - 12/09/2014 |
| Lugar del Evento: | Penang, Malaysia |
| Título del Libro: | 2nd International Workshop on Speech, Language and Audio in Multimedia (SLAM2014) |
| Fecha: | 2014 |
| Materias: | |
| ODS: | |
| Palabras Clave Informales: | Speech synthesis, speaking style transplantation, automatic genre identification, Latent Semantic Analysis |
| Escuela: | E.T.S.I. Telecomunicación (UPM) |
| Departamento: | Ingeniería Electrónica |
| Licencias Creative Commons: | Reconocimiento - Sin obra derivada - No comercial |
Vista Previa |
PDF (Portable Document Format)
- Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (107kB) | Vista Previa |
One of the biggest challenges in speech synthesis is the production of contextually-appropriate naturally sounding synthetic voices. This means that a Text-To-Speech system must be able to analyze a text beyond the sentence limits in order to select, or even modulate, the speaking style according to a broader context. Our current architecture is based on a two-step approach: text genre identification and speaking style synthesis according to the detected discourse genre. For the final implementation, a set of four genres and their corresponding speaking styles were considered: broadcast news, live sport commentaries, interviews and political speeches. In the final TTS evaluation, the four speaking styles were transplanted to the neutral voices of other speakers not included in the training database. When the transplanted styles were compared to the neutral voices, transplantation was significantly preferred and the similarity to the target speaker was as high as 78%.
| ID de Registro: | 37521 |
|---|---|
| Identificador DC: | https://oa.upm.es/37521/ |
| Identificador OAI: | oai:oa.upm.es:37521 |
| Depositado por: | Memoria Investigacion |
| Depositado el: | 14 Oct 2015 17:05 |
| Ultima Modificación: | 06 Jun 2016 17:05 |
Publicar en el Archivo Digital desde el Portal Científico