Deep-Sync: A novel deep learning-based tool for semantic-aware subtitling synchronisation

Martín García, Alejandro ORCID: https://orcid.org/0000-0002-0800-7632, González Carrasco, Israel ORCID: https://orcid.org/0000-0001-8294-3157, Rodríguez Fernández, Víctor ORCID: https://orcid.org/0000-0002-8589-6621, Souto Rico, Mónica ORCID: https://orcid.org/0000-0002-9315-7861, Camacho Fernández, David ORCID: https://orcid.org/0000-0002-5051-3475 and Ruiz Mezcua, Belen ORCID: https://orcid.org/0000-0003-1993-8325 (2021). Deep-Sync: A novel deep learning-based tool for semantic-aware subtitling synchronisation. "Neural Computing and Applications" ; ISSN 1433-3058. https://doi.org/10.1007/s00521-021-05751-y.

Descripción

Título: Deep-Sync: A novel deep learning-based tool for semantic-aware subtitling synchronisation
Autor/es:
Tipo de Documento: Artículo
Título de Revista/Publicación: Neural Computing and Applications
Fecha: 8 Febrero 2021
ISSN: 1433-3058
Materias:
ODS:
Palabras Clave Informales: TV Broadcasting, Synchronisation, Language Model, Deep Neural Networks, Machine Learning
Escuela: E.T.S.I. de Sistemas Informáticos (UPM)
Departamento: Sistemas Informáticos
Licencias Creative Commons: Reconocimiento - Sin obra derivada - No comercial

Texto completo

[thumbnail of 9220082.pdf] PDF (Portable Document Format) - Se necesita un visor de ficheros PDF, como GSview, Xpdf o Adobe Acrobat Reader
Descargar (1MB)

Resumen

Subtitles are a key element to make any media content accessible for people who suffer from hearing impairment and for elderly people, but also useful when watching TV in a noisy environment or learning new languages. Most of the time, subtitles are generated manually in advance, building a verbatim and synchronised transcription of the audio. However, in TV live broadcasts, captions are created in real time by a re-speaker with the help of a voice recognition software, which inevitability leads to delays and lack of synchronisation. In this paper, we present Deep-Sync, a tool for the alignment of subtitles with the audio-visual content. The architecture integrates a deep language representation model and a real-time voice recognition software to build a semantic-aware alignment tool that successfully aligns most of the subtitles even when there is no direct correspondence between the re-speaker and the audio content. In order to avoid any kind of censorship, Deep-Sync can be deployed directly on users' TVs causing a small delay to perform the alignment, but avoiding to delay the signal at the broadcaster station. Deep-Sync was compared with other subtitles alignment tool, showing that our proposal is able to improve the synchronisation in all tested cases.

Proyectos asociados

Tipo
Código
Acrónimo
Responsable
Título
Gobierno de España
TIN2017- 85727-C4-3-P
DeepBio
Sin especificar
Sin especificar
Comunidad de Madrid
S2018/TCS-4566
CYNAMON-CM
Sin especificar
Sin especificar

Más información

ID de Registro: 88871
Identificador DC: https://oa.upm.es/88871/
Identificador OAI: oai:oa.upm.es:88871
URL Portal Científico: https://portalcientifico.upm.es/es/ipublic/item/9220082
Identificador DOI: 10.1007/s00521-021-05751-y
URL Oficial: https://link.springer.com/article/10.1007/s00521-0...
Depositado por: iMarina Portal Científico
Depositado el: 05 May 2025 17:24
Ultima Modificación: 05 May 2025 17:24