A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis

Gallardo Antolín, Ascensión; Montero Martínez, Juan Manuel y King, Simon (2014). A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis. En: "15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014)", 14/09/2014 - 18/09/2014, Singapore. pp. 2370-2374.

Descripción

Título: A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis
Autor/es:
  • Gallardo Antolín, Ascensión
  • Montero Martínez, Juan Manuel
  • King, Simon
Tipo de Documento: Ponencia en Congreso o Jornada (Artículo)
Título del Evento: 15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014)
Fechas del Evento: 14/09/2014 - 18/09/2014
Lugar del Evento: Singapore
Título del Libro: Proceedings 15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014)
Fecha: 2014
Materias:
Palabras Clave Informales: Diarization, audio segmentation, expressive textto-speech, media recordings
Escuela: E.T.S.I. Telecomunicación (UPM)
Departamento: Ingeniería Electrónica
Licencias Creative Commons: Reconocimiento - Sin obra derivada - No comercial

Texto completo

[img]
Vista Previa
PDF (Document Portable Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (227kB) | Vista Previa

Resumen

Traditional Text-To-Speech (TTS) systems have been developed using especially-designed non-expressive scripted recordings. In order to develop a new generation of expressive TTS systems in the Simple4All project, real recordings from the media should be used for training new voices with a whole new range of speaking styles. However, for processing this more spontaneous material, the new systems must be able to deal with imperfect data (multi-speaker recordings, background and foreground music and noise), filtering out low-quality audio segments and creating mono-speaker clusters. In this paper we compare several architectures for combining speaker diarization and music and noise detection which improve the precision and overall quality of the segmentation.

Proyectos asociados

TipoCódigoAcrónimoResponsableTítulo
FP7287678SIMPLE4ALLUniversity of EdinburghSpeech synthesis that improves through adaptive learning

Más información

ID de Registro: 37500
Identificador DC: http://oa.upm.es/37500/
Identificador OAI: oai:oa.upm.es:37500
Depositado por: Memoria Investigacion
Depositado el: 15 Sep 2015 18:25
Ultima Modificación: 06 Jun 2016 18:25
  • Open Access
  • Open Access
  • Sherpa-Romeo
    Compruebe si la revista anglosajona en la que ha publicado un artículo permite también su publicación en abierto.
  • Dulcinea
    Compruebe si la revista española en la que ha publicado un artículo permite también su publicación en abierto.
  • Recolecta
  • e-ciencia
  • Observatorio I+D+i UPM
  • OpenCourseWare UPM