Full text
|
PDF
- Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (227kB) | Preview |
Gallardo Antolín, Ascensión and Montero Martínez, Juan Manuel and King, Simon (2014). A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis. In: "15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014)", 14/09/2014 - 18/09/2014, Singapore. pp. 2370-2374.
Title: | A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis |
---|---|
Author/s: |
|
Item Type: | Presentation at Congress or Conference (Article) |
Event Title: | 15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014) |
Event Dates: | 14/09/2014 - 18/09/2014 |
Event Location: | Singapore |
Title of Book: | Proceedings 15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014) |
Date: | 2014 |
Subjects: | |
Freetext Keywords: | Diarization, audio segmentation, expressive textto-speech, media recordings |
Faculty: | E.T.S.I. Telecomunicación (UPM) |
Department: | Ingeniería Electrónica |
Creative Commons Licenses: | Recognition - No derivative works - Non commercial |
|
PDF
- Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (227kB) | Preview |
Traditional Text-To-Speech (TTS) systems have been developed using especially-designed non-expressive scripted recordings. In order to develop a new generation of expressive TTS systems in the Simple4All project, real recordings from the media should be used for training new voices with a whole new range of speaking styles. However, for processing this more spontaneous material, the new systems must be able to deal with imperfect data (multi-speaker recordings, background and foreground music and noise), filtering out low-quality audio segments and creating mono-speaker clusters. In this paper we compare several architectures for combining speaker diarization and music and noise detection which improve the precision and overall quality of the segmentation.
Type | Code | Acronym | Leader | Title |
---|---|---|---|---|
FP7 | 287678 | SIMPLE4ALL | University of Edinburgh | Speech synthesis that improves through adaptive learning |
Item ID: | 37500 |
---|---|
DC Identifier: | http://oa.upm.es/37500/ |
OAI Identifier: | oai:oa.upm.es:37500 |
Deposited by: | Memoria Investigacion |
Deposited on: | 15 Sep 2015 18:25 |
Last Modified: | 06 Jun 2016 18:25 |