Automatic Synchronization between Lyrics and Music CD Recordings Based on Viterbi Alignment of Segregated Vocal Signals

Authors:
Hiromasa Fujihara;Masataka Goto;Jun Ogata;Kazunori Komatani;Tetsuya Ogata;Hiroshi G. Okuno
Affiliations:
Kyoto University, Japan;National Institute of Advanced Industrial Science and Technology (AIST), Japan;National Institute of Advanced Industrial Science and Technology (AIST), Japan;Kyoto University, Japan;Kyoto University, Japan;Kyoto University, Japan
Venue:
ISM '06 Proceedings of the Eighth IEEE International Symposium on Multimedia
Year:
2006

Citing 0
Cited 5

Refinement Strategies for Music Synchronization

Computer Music Modeling and Retrieval. Genesis of Meaning in Sound and Music
On the improvement of singing voice separation for monaural recordings using the MIR-1K dataset

IEEE Transactions on Audio, Speech, and Language Processing
Word level automatic alignment of music and lyrics using vocal synthesis

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Automatic recognition of lyrics in singing

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on atypical speech
Towards reliable partial music alignments using multiple synchronization strategies

AMR'09 Proceedings of the 7th international conference on Adaptive multimedia retrieval: understanding media and adapting to the user

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a system that can automatically synchronize between polyphonic musical audio signals and corresponding lyrics. Although there were methods that can synchronize between monophonic speech signals and corresponding text transcriptions by using Viterbi alignment techniques, they cannot be applied to vocals in CD recordings because accompaniment sounds often overlap with vocals. To align lyrics with such vocals, we therefore developed three methods: a method for segregating vocals from polyphonic sound mixtures, a method for detecting vocal sections, and a method for adapting a speech-recognizer phone model to segregated vocal signals. Experimental results for 10 Japanese popular-music songs showed that our system can synchronize between music and lyrics with satisfactory accuracy for 8 songs.