Texto completo
Vista Previa |
PDF (Portable Document Format)
- Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (227kB) | Vista Previa |
ORCID: https://orcid.org/0000-0002-7908-5400 and King, Simon
(2014).
A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis.
En: "15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014)", 14/09/2014 - 18/09/2014, Singapore. pp. 2370-2374.
| Título: | A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis |
|---|---|
| Autor/es: |
|
| Tipo de Documento: | Ponencia en Congreso o Jornada (Artículo) |
| Título del Evento: | 15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014) |
| Fechas del Evento: | 14/09/2014 - 18/09/2014 |
| Lugar del Evento: | Singapore |
| Título del Libro: | Proceedings 15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014) |
| Fecha: | 2014 |
| Materias: | |
| ODS: | |
| Palabras Clave Informales: | Diarization, audio segmentation, expressive textto-speech, media recordings |
| Escuela: | E.T.S.I. Telecomunicación (UPM) |
| Departamento: | Ingeniería Electrónica |
| Licencias Creative Commons: | Reconocimiento - Sin obra derivada - No comercial |
Vista Previa |
PDF (Portable Document Format)
- Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (227kB) | Vista Previa |
Traditional Text-To-Speech (TTS) systems have been developed using especially-designed non-expressive scripted recordings. In order to develop a new generation of expressive TTS systems in the Simple4All project, real recordings from the media should be used for training new voices with a whole new range of speaking styles. However, for processing this more spontaneous material, the new systems must be able to deal with imperfect data (multi-speaker recordings, background and foreground music and noise), filtering out low-quality audio segments and creating mono-speaker clusters. In this paper we compare several architectures for combining speaker diarization and music and noise detection which improve the precision and overall quality of the segmentation.
| ID de Registro: | 37500 |
|---|---|
| Identificador DC: | https://oa.upm.es/37500/ |
| Identificador OAI: | oai:oa.upm.es:37500 |
| Depositado por: | Memoria Investigacion |
| Depositado el: | 15 Sep 2015 18:25 |
| Ultima Modificación: | 06 Jun 2016 18:25 |
Publicar en el Archivo Digital desde el Portal Científico