A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis

Gallardo Antolín, Ascensión, Montero Martínez, Juan Manuel ORCID: https://orcid.org/0000-0002-7908-5400 and King, Simon (2014). A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis. En: "15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014)", 14/09/2014 - 18/09/2014, Singapore. pp. 2370-2374.

Descripción

Título: A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis
Autor/es:
Tipo de Documento: Ponencia en Congreso o Jornada (Artículo)
Título del Evento: 15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014)
Fechas del Evento: 14/09/2014 - 18/09/2014
Lugar del Evento: Singapore
Título del Libro: Proceedings 15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014)
Fecha: 2014
Materias:
ODS:
Palabras Clave Informales: Diarization, audio segmentation, expressive textto-speech, media recordings
Escuela: E.T.S.I. Telecomunicación (UPM)
Departamento: Ingeniería Electrónica
Licencias Creative Commons: Reconocimiento - Sin obra derivada - No comercial

Texto completo

[thumbnail of INVE_MEM_2014_193698.pdf]
Vista Previa
PDF (Portable Document Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (227kB) | Vista Previa

Resumen

Traditional Text-To-Speech (TTS) systems have been developed using especially-designed non-expressive scripted recordings. In order to develop a new generation of expressive TTS systems in the Simple4All project, real recordings from the media should be used for training new voices with a whole new range of speaking styles. However, for processing this more spontaneous material, the new systems must be able to deal with imperfect data (multi-speaker recordings, background and foreground music and noise), filtering out low-quality audio segments and creating mono-speaker clusters. In this paper we compare several architectures for combining speaker diarization and music and noise detection which improve the precision and overall quality of the segmentation.

Proyectos asociados

Tipo
Código
Acrónimo
Responsable
Título
FP7
287678
SIMPLE4ALL
University of Edinburgh
Speech synthesis that improves through adaptive learning

Más información

ID de Registro: 37500
Identificador DC: https://oa.upm.es/37500/
Identificador OAI: oai:oa.upm.es:37500
Depositado por: Memoria Investigacion
Depositado el: 15 Sep 2015 18:25
Ultima Modificación: 06 Jun 2016 18:25