Automatic recognition of lyrics in singing

Authors:
Annamaria Mesaros;Tuomas Virtanen
Affiliations:
Department of Signal Processing, Tampere University of Technology, Tampere, Finland;Department of Signal Processing, Tampere University of Technology, Tampere, Finland
Venue:
EURASIP Journal on Audio, Speech, and Music Processing - Special issue on atypical speech
Year:
2010

Citing 10
Cited 2

Fundamentals of speech recognition

Fundamentals of speech recognition
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Experiments in Speaker Normalisation and Adaptation for Large Vocabulary Speech Recognition

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Automatic Synchronization between Lyrics and Music CD Recordings Based on Viterbi Alignment of Segregated Vocal Signals

ISM '06 Proceedings of the Eighth IEEE International Symposium on Multimedia
Music information retrieval from a singing voice using lyrics and melody information

EURASIP Journal on Applied Signal Processing
Automatic transcription of melody, bass line, and chords in polyphonic music

Computer Music Journal
Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria

IEEE Transactions on Audio, Speech, and Language Processing
LyricAlly: Automatic Synchronization of Textual Lyrics to Acoustic Music Signals

IEEE Transactions on Audio, Speech, and Language Processing
Automatic singer recognition of popular music recordings via estimation and modeling of solo vocal signals

IEEE Transactions on Audio, Speech, and Language Processing
Lyrics-based audio retrieval and multimodal navigation in music collections

ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries

Optimization and Parallelization of Monaural Source Separation Algorithms in the openBliSSART Toolkit

Journal of Signal Processing Systems
Singing Voice Enhancement in Monaural Music Signals Based on Two-stage Harmonic/Percussive Sound Separation on Multiple Resolution Spectrograms

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper considers the task of recognizing phonemes and words from a singing input by using a phonetic hidden Markov model recognizer. The system is targeted to both monophonic singing and singing in polyphonic music. A vocal separation algorithm is applied to separate the singing from polyphonic music. Due to the lack of annotated singing databases, the recognizer is trained using speech and linearly adapted to singing. Global adaptation to singing is found to improve singing recognition performance. Further improvement is obtained by gender-specific adaptation. We also study adaptation with multiple base classes defined by either phonetic or acoustic similarity. We test phoneme-level and word-level n-gram language models. The phoneme language models are trained on the speech database text. The large-vocabulary word-level language model is trained on a database of textual lyrics. Two applications are presented. The recognizer is used to align textual lyrics to vocals in polyphonic music, obtaining an average error of 0.94 seconds for line-level alignment. A query-by-singing retrieval application based on the recognized words is also constructed; in 57% of the cases, the first retrieved song is the correct one.