Word level automatic alignment of music and lyrics using vocal synthesis

Authors:
Namunu C. Maddage;Khe Chai Sim;Haizhou Li
Affiliations:
Royal Melbourne Institute of Technology (RMIT), Melbourne, Australia;Institute for Infocomm Research (I²R), Singapore;Institute for Infocomm Research (I²R), Singapore
Venue:
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Year:
2010

Citing 5
Cited 0

Fundamentals of speech recognition

Fundamentals of speech recognition
LyricAlly: automatic synchronization of acoustic musical signals and textual lyrics

Proceedings of the 12th annual ACM international conference on Multimedia
Music structure based vector space retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Syllabic level automatic synchronization of music signals and text lyrics

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Automatic Synchronization between Lyrics and Music CD Recordings Based on Viterbi Alignment of Segregated Vocal Signals

ISM '06 Proceedings of the Eighth IEEE International Symposium on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a signal-based approach instead of the commonly used model-based approach, to automatically align vocal music with text lyrics at the word level. In this approach, we use a text-to-speech system to synthesize the singing voice according to the lyrics. In this way, aligning the music signal with the corresponding text lyrics becomes the alignment of two audio signals. This study uses the results of music information modeling and singing voice synthesis. In music information modeling, we study different music representation strategies for music segmentation, music region indexing and region content descriptions; in singing voice synthesis, we generate singing voice by making use of music knowledge to approximate the target vocal line in terms of tempo. The experimental results on a 20-song database show 26.3% and 36.1% word level alignment error rates at eighth note and sixteenth note alignment tolerances respectively. The proposed approach presents an alternative and effective solution to music-lyrics alignment which may require less training dataset.