A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis

Gallardo Antolín, Ascensión, Montero Martínez, Juan Manuel

and King, Simon (2014). A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis. En: "15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014)", 14/09/2014 - 18/09/2014, Singapore. pp. 2370-2374.

Descripción

Título:	A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis
Autor/es:	Gallardo Antolín, Ascensión Montero Martínez, Juan Manuel https://orcid.org/0000-0002-7908-5400 King, Simon
Tipo de Documento:	Ponencia en Congreso o Jornada (Artículo)
Título del Evento:	15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014)
Fechas del Evento:	14/09/2014 - 18/09/2014
Lugar del Evento:	Singapore
Título del Libro:	Proceedings 15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014)
Fecha:	2014
Materias:	Telecomunicaciones
ODS:	09. Industria, innovación e infraestructura
Palabras Clave Informales:	Diarization, audio segmentation, expressive textto-speech, media recordings
Escuela:	E.T.S.I. Telecomunicación (UPM)
Departamento:	Ingeniería Electrónica
Licencias Creative Commons:	Reconocimiento - Sin obra derivada - No comercial

Texto completo

Vista Previa

PDF (Portable Document Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (227kB) | Vista Previa

Resumen

Traditional Text-To-Speech (TTS) systems have been developed using especially-designed non-expressive scripted recordings. In order to develop a new generation of expressive TTS systems in the Simple4All project, real recordings from the media should be used for training new voices with a whole new range of speaking styles. However, for processing this more spontaneous material, the new systems must be able to deal with imperfect data (multi-speaker recordings, background and foreground music and noise), filtering out low-quality audio segments and creating mono-speaker clusters. In this paper we compare several architectures for combining speaker diarization and music and noise detection which improve the precision and overall quality of the segmentation.

Proyectos asociados

Tipo

Código

Acrónimo

Responsable

Título

FP7

287678

SIMPLE4ALL

University of Edinburgh

Speech synthesis that improves through adaptive learning

Más información

ID de Registro:	37500
Identificador DC:	https://oa.upm.es/37500/
Identificador OAI:	oai:oa.upm.es:37500
Depositado por:	Memoria Investigacion
Depositado el:	15 Sep 2015 18:25
Ultima Modificación:	06 Jun 2016 18:25

Estadísticas

Exportar cita

Editar (sólo personal del Archivo)

En esta página

Menú principal

Buscar

A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis

Cita

Descripción

Texto completo

Resumen

Proyectos asociados

Más información

Acciones

Documentos

El repositorio

Agrupados por ...

Datos Investigación

Financiadores

Especiales

En otros formatos

Redes sociales

Información adicional