LyricAlly: Automatic Synchronization of Textual Lyrics to Acoustic Music Signals

Authors:
Min-Yen Kan;Ye Wang;D. Iskandar;Tin Lay New;A. Shenoy
Affiliations:
Sch. of Comput., Nat. Univ. of Singapore, Singapore;-;-;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2008

Citing 0
Cited 5

Simultaneous Synchronization of Text and Speech for Broadcast News Subtitling

ISNN 2009 Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks - Part III
Automatic recognition of lyrics in singing

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on atypical speech
Towards precise and robust automatic synchronization of live speech and its transcripts

Speech Communication
The need for music information retrieval with user-centered and multimodal strategies

MIRUM '11 Proceedings of the 1st international ACM workshop on Music information retrieval with user-centered and multimodal strategies
RRA: an audio format for single-source music and lyrics

Proceedings of the 50th Annual Southeast Regional Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present LyricAlly, a prototype that automatically aligns acoustic musical signals with their corresponding textual lyrics, in a manner similar to manually-aligned karaoke. We tackle this problem based on a multimodal approach, using an appropriate pairing of audio and text processing to create the resulting prototype. LyricAlly's acoustic signal processing uses standard audio features but constrained and informed by the musical nature of the signal. The resulting detected hierarchical rhythm structure is utilized in singing voice detection and chorus detection to produce results of higher accuracy and lower computational costs than their respective baselines. Text processing is employed to approximate the length of the sung passages from the lyrics. Results show an average error of less than one bar for per-line alignment of the lyrics on a test bed of 20 songs (sampled from CD audio and carefully selected for variety). We perform a comprehensive set of system-wide and per-component tests and discuss their results. We conclude by outlining steps for further development.