On split Dynamic Time Warping for robust Automatic Dialogue Replacement

Authors:
Pieter Soens;Werner Verhelst
Affiliations:
Vrije Universiteit Brussel, Department of ETRO-DSSP, Pleinlaan 2, B-1050 Brussels, Belgium;Vrije Universiteit Brussel, Department of ETRO-DSSP, Pleinlaan 2, B-1050 Brussels, Belgium and Interdisciplinary Institute for Broadband Technology, Gaston Crommenlaan 8 (bus 102), B-9050 Gent-Led ...
Venue:
Signal Processing
Year:
2012

Citing 8
Cited 0

Fundamentals of speech recognition

Fundamentals of speech recognition
Scaling up dynamic time warping for datamining applications

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
3-D model-based tracking of humans in action: a multi-view approach

CVPR '96 Proceedings of the 1996 Conference on Computer Vision and Pattern Recognition (CVPR '96)
Automatic time alignment of phonemes using acoustic-phonetic information

Automatic time alignment of phonemes using acoustic-phonetic information
Speaker-independent phoneme alignment using transition-dependent states

Speech Communication
Dialogue Editing for Motion Pictures: A Guide to the Invisible Art

Dialogue Editing for Motion Pictures: A Guide to the Invisible Art
An on-line time warping algorithm for tracking musical performances

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II

Quantified Score

Hi-index	0.09

Visualization

Abstract

In this article, we present LipSynch, a software tool that can be used for the automatic replacement of speech dialogues in motion pictures, video or television series. The system operates in two steps: during analysis, the timing relationships between the speech segments of the dialogues that serve as a timing reference and the corresponding speech segments in the replacement dialogues are measured by means of a split Dynamic Time Warping algorithm. The obtained warping paths are then processed and used to synthesize high-quality natural-sounding speech dialogues that are precisely time-synchronized with the reference dialogues. Subjective audio-visual listening tests performed within the context of a difficult Automatic Dialogue Replacement task demonstrated that LipSynch achieves a significant improvement compared to the industry-standard benchmark VocALign, both in terms of achieved lip-synchronization accuracy as well as in overall speech quality of the synthesized dialogues.