Fundamentals of speech recognition
Fundamentals of speech recognition
Scaling up dynamic time warping for datamining applications
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
3-D model-based tracking of humans in action: a multi-view approach
CVPR '96 Proceedings of the 1996 Conference on Computer Vision and Pattern Recognition (CVPR '96)
Automatic time alignment of phonemes using acoustic-phonetic information
Automatic time alignment of phonemes using acoustic-phonetic information
Speaker-independent phoneme alignment using transition-dependent states
Speech Communication
Dialogue Editing for Motion Pictures: A Guide to the Invisible Art
Dialogue Editing for Motion Pictures: A Guide to the Invisible Art
An on-line time warping algorithm for tracking musical performances
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Hi-index | 0.09 |
In this article, we present LipSynch, a software tool that can be used for the automatic replacement of speech dialogues in motion pictures, video or television series. The system operates in two steps: during analysis, the timing relationships between the speech segments of the dialogues that serve as a timing reference and the corresponding speech segments in the replacement dialogues are measured by means of a split Dynamic Time Warping algorithm. The obtained warping paths are then processed and used to synthesize high-quality natural-sounding speech dialogues that are precisely time-synchronized with the reference dialogues. Subjective audio-visual listening tests performed within the context of a difficult Automatic Dialogue Replacement task demonstrated that LipSynch achieves a significant improvement compared to the industry-standard benchmark VocALign, both in terms of achieved lip-synchronization accuracy as well as in overall speech quality of the synthesized dialogues.