Word level automatic alignment of music and lyrics using vocal synthesis

  • Authors:
  • Namunu C. Maddage;Khe Chai Sim;Haizhou Li

  • Affiliations:
  • Royal Melbourne Institute of Technology (RMIT), Melbourne, Australia;Institute for Infocomm Research (I2R), Singapore;Institute for Infocomm Research (I2R), Singapore

  • Venue:
  • ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a signal-based approach instead of the commonly used model-based approach, to automatically align vocal music with text lyrics at the word level. In this approach, we use a text-to-speech system to synthesize the singing voice according to the lyrics. In this way, aligning the music signal with the corresponding text lyrics becomes the alignment of two audio signals. This study uses the results of music information modeling and singing voice synthesis. In music information modeling, we study different music representation strategies for music segmentation, music region indexing and region content descriptions; in singing voice synthesis, we generate singing voice by making use of music knowledge to approximate the target vocal line in terms of tempo. The experimental results on a 20-song database show 26.3% and 36.1% word level alignment error rates at eighth note and sixteenth note alignment tolerances respectively. The proposed approach presents an alternative and effective solution to music-lyrics alignment which may require less training dataset.