Full text
Preview |
PDF
- Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (534kB) | Preview |
Lorenzo Trueba, Jaime and Martínez González, Beatriz and Lopez Ludeña, Veronica and Barra Chicote, Roberto and Ferreiros López, Javier and Yamagishi, J. and Montero Martínez, Juan Manuel (2012). Towards an unsupervised speaking style voice building framework: multi-style speaker diarization. In: "InterSpeech 2012 - 13th Annual Conference of the International Speech Communication Association", 09/09/2012 - 13/09/2012, Portland, Oregon. pp. 1-4.
Title: | Towards an unsupervised speaking style voice building framework: multi-style speaker diarization |
---|---|
Author/s: |
|
Item Type: | Presentation at Congress or Conference (Article) |
Event Title: | InterSpeech 2012 - 13th Annual Conference of the International Speech Communication Association |
Event Dates: | 09/09/2012 - 13/09/2012 |
Event Location: | Portland, Oregon |
Title of Book: | InterSpeech 2012 - 13th Annual Conference of the International Speech Communication Association |
Date: | 2012 |
Subjects: | |
Freetext Keywords: | Expressive speech synthesis, speaker diarization, speaking styles, voice cloning |
Faculty: | E.T.S.I. Telecomunicación (UPM) |
Department: | Ingeniería Electrónica |
Creative Commons Licenses: | Recognition - No derivative works - Non commercial |
Preview |
PDF
- Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (534kB) | Preview |
Current text-to-speech systems are developed using studio-recorded speech in a neutral style or based on acted emotions. However, the proliferation of media sharing sites would allow developing a new generation of speech-based systems which could cope with spontaneous and styled speech. This paper proposes an architecture to deal with realistic recordings and carries out some experiments on unsupervised speaker diarization. In order to maximize the speaker purity of the clusters while keeping a high speaker coverage, the paper evaluates the F-measure of a diarization module, achieving high scores (>85%) especially when the clusters are longer than 30 seconds, even for the more spontaneous and expressive styles (such as talk shows or sports).
Item ID: | 20407 |
---|---|
DC Identifier: | https://oa.upm.es/20407/ |
OAI Identifier: | oai:oa.upm.es:20407 |
Deposited by: | Memoria Investigacion |
Deposited on: | 05 Oct 2013 08:29 |
Last Modified: | 21 Apr 2016 23:12 |