A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis

Gallardo Antolín, Ascensión and Montero Martínez, Juan Manuel and King, Simon (2014). A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis. In: "15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014)", 14/09/2014 - 18/09/2014, Singapore. pp. 2370-2374.

Description

Title: A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis
Author/s:
  • Gallardo Antolín, Ascensión
  • Montero Martínez, Juan Manuel
  • King, Simon
Item Type: Presentation at Congress or Conference (Article)
Event Title: 15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014)
Event Dates: 14/09/2014 - 18/09/2014
Event Location: Singapore
Title of Book: Proceedings 15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014)
Date: 2014
Subjects:
Freetext Keywords: Diarization, audio segmentation, expressive textto-speech, media recordings
Faculty: E.T.S.I. Telecomunicación (UPM)
Department: Ingeniería Electrónica
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[img]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (227kB) | Preview

Abstract

Traditional Text-To-Speech (TTS) systems have been developed using especially-designed non-expressive scripted recordings. In order to develop a new generation of expressive TTS systems in the Simple4All project, real recordings from the media should be used for training new voices with a whole new range of speaking styles. However, for processing this more spontaneous material, the new systems must be able to deal with imperfect data (multi-speaker recordings, background and foreground music and noise), filtering out low-quality audio segments and creating mono-speaker clusters. In this paper we compare several architectures for combining speaker diarization and music and noise detection which improve the precision and overall quality of the segmentation.

Funding Projects

TypeCodeAcronymLeaderTitle
FP7287678SIMPLE4ALLUniversity of EdinburghSpeech synthesis that improves through adaptive learning

More information

Item ID: 37500
DC Identifier: http://oa.upm.es/37500/
OAI Identifier: oai:oa.upm.es:37500
Deposited by: Memoria Investigacion
Deposited on: 15 Sep 2015 18:25
Last Modified: 06 Jun 2016 18:25
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM