Towards an unsupervised speaking style voice building framework: multi-style speaker diarization

Lorenzo Trueba, Jaime and Martínez González, Beatriz and Lopez Ludeña, Veronica and Barra Chicote, Roberto and Ferreiros López, Javier and Yamagishi, J. and Montero Martínez, Juan Manuel (2012). Towards an unsupervised speaking style voice building framework: multi-style speaker diarization. In: "InterSpeech 2012 - 13th Annual Conference of the International Speech Communication Association", 09/09/2012 - 13/09/2012, Portland, Oregon. pp. 1-4.

Description

Title: Towards an unsupervised speaking style voice building framework: multi-style speaker diarization
Author/s:
  • Lorenzo Trueba, Jaime
  • Martínez González, Beatriz
  • Lopez Ludeña, Veronica
  • Barra Chicote, Roberto
  • Ferreiros López, Javier
  • Yamagishi, J.
  • Montero Martínez, Juan Manuel
Item Type: Presentation at Congress or Conference (Article)
Event Title: InterSpeech 2012 - 13th Annual Conference of the International Speech Communication Association
Event Dates: 09/09/2012 - 13/09/2012
Event Location: Portland, Oregon
Title of Book: InterSpeech 2012 - 13th Annual Conference of the International Speech Communication Association
Date: 2012
Subjects:
Freetext Keywords: Expressive speech synthesis, speaker diarization, speaking styles, voice cloning
Faculty: E.T.S.I. Telecomunicación (UPM)
Department: Ingeniería Electrónica
Creative Commons Licenses: Recognition - No derivative works - Non commercial

Full text

[thumbnail of INVE_MEM_2012_134434.pdf]
Preview
PDF - Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (534kB) | Preview

Abstract

Current text-to-speech systems are developed using studio-recorded speech in a neutral style or based on acted emotions. However, the proliferation of media sharing sites would allow developing a new generation of speech-based systems which could cope with spontaneous and styled speech. This paper proposes an architecture to deal with realistic recordings and carries out some experiments on unsupervised speaker diarization. In order to maximize the speaker purity of the clusters while keeping a high speaker coverage, the paper evaluates the F-measure of a diarization module, achieving high scores (>85%) especially when the clusters are longer than 30 seconds, even for the more spontaneous and expressive styles (such as talk shows or sports).

More information

Item ID: 20407
DC Identifier: https://oa.upm.es/20407/
OAI Identifier: oai:oa.upm.es:20407
Deposited by: Memoria Investigacion
Deposited on: 05 Oct 2013 08:29
Last Modified: 21 Apr 2016 23:12
  • Logo InvestigaM (UPM)
  • Logo GEOUP4
  • Logo Open Access
  • Open Access
  • Logo Sherpa/Romeo
    Check whether the anglo-saxon journal in which you have published an article allows you to also publish it under open access.
  • Logo Dulcinea
    Check whether the spanish journal in which you have published an article allows you to also publish it under open access.
  • Logo de Recolecta
  • Logo del Observatorio I+D+i UPM
  • Logo de OpenCourseWare UPM