Towards an unsupervised speaking style voice building framework: multi-style speaker diarization

Lorenzo Trueba, Jaime; Martínez González, Beatriz; Lopez Ludeña, Veronica; Barra Chicote, Roberto; Ferreiros López, Javier; Yamagishi, J. y Montero Martínez, Juan Manuel (2012). Towards an unsupervised speaking style voice building framework: multi-style speaker diarization. En: "InterSpeech 2012 - 13th Annual Conference of the International Speech Communication Association", 09/09/2012 - 13/09/2012, Portland, Oregon. pp. 1-4.

Descripción

Título: Towards an unsupervised speaking style voice building framework: multi-style speaker diarization
Autor/es:
  • Lorenzo Trueba, Jaime
  • Martínez González, Beatriz
  • Lopez Ludeña, Veronica
  • Barra Chicote, Roberto
  • Ferreiros López, Javier
  • Yamagishi, J.
  • Montero Martínez, Juan Manuel
Tipo de Documento: Ponencia en Congreso o Jornada (Artículo)
Título del Evento: InterSpeech 2012 - 13th Annual Conference of the International Speech Communication Association
Fechas del Evento: 09/09/2012 - 13/09/2012
Lugar del Evento: Portland, Oregon
Título del Libro: InterSpeech 2012 - 13th Annual Conference of the International Speech Communication Association
Fecha: 2012
Materias:
Palabras Clave Informales: Expressive speech synthesis, speaker diarization, speaking styles, voice cloning
Escuela: E.T.S.I. Telecomunicación (UPM)
Departamento: Ingeniería Electrónica
Licencias Creative Commons: Reconocimiento - Sin obra derivada - No comercial

Texto completo

[img]
Vista Previa
PDF (Document Portable Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (534kB) | Vista Previa

Resumen

Current text-to-speech systems are developed using studio-recorded speech in a neutral style or based on acted emotions. However, the proliferation of media sharing sites would allow developing a new generation of speech-based systems which could cope with spontaneous and styled speech. This paper proposes an architecture to deal with realistic recordings and carries out some experiments on unsupervised speaker diarization. In order to maximize the speaker purity of the clusters while keeping a high speaker coverage, the paper evaluates the F-measure of a diarization module, achieving high scores (>85%) especially when the clusters are longer than 30 seconds, even for the more spontaneous and expressive styles (such as talk shows or sports).

Más información

ID de Registro: 20407
Identificador DC: http://oa.upm.es/20407/
Identificador OAI: oai:oa.upm.es:20407
Depositado por: Memoria Investigacion
Depositado el: 05 Oct 2013 08:29
Ultima Modificación: 21 Abr 2016 23:12
  • Open Access
  • Open Access
  • Sherpa-Romeo
    Compruebe si la revista anglosajona en la que ha publicado un artículo permite también su publicación en abierto.
  • Dulcinea
    Compruebe si la revista española en la que ha publicado un artículo permite también su publicación en abierto.
  • Recolecta
  • e-ciencia
  • Observatorio I+D+i UPM
  • OpenCourseWare UPM