Fundamentals of speech recognition
Fundamentals of speech recognition
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Experiments in Speaker Normalisation and Adaptation for Large Vocabulary Speech Recognition
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
ISM '06 Proceedings of the Eighth IEEE International Symposium on Multimedia
Music information retrieval from a singing voice using lyrics and melody information
EURASIP Journal on Applied Signal Processing
Automatic transcription of melody, bass line, and chords in polyphonic music
Computer Music Journal
IEEE Transactions on Audio, Speech, and Language Processing
LyricAlly: Automatic Synchronization of Textual Lyrics to Acoustic Music Signals
IEEE Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Audio, Speech, and Language Processing
Lyrics-based audio retrieval and multimodal navigation in music collections
ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
Journal of Signal Processing Systems
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Hi-index | 0.00 |
The paper considers the task of recognizing phonemes and words from a singing input by using a phonetic hidden Markov model recognizer. The system is targeted to both monophonic singing and singing in polyphonic music. A vocal separation algorithm is applied to separate the singing from polyphonic music. Due to the lack of annotated singing databases, the recognizer is trained using speech and linearly adapted to singing. Global adaptation to singing is found to improve singing recognition performance. Further improvement is obtained by gender-specific adaptation. We also study adaptation with multiple base classes defined by either phonetic or acoustic similarity. We test phoneme-level and word-level n-gram language models. The phoneme language models are trained on the speech database text. The large-vocabulary word-level language model is trained on a database of textual lyrics. Two applications are presented. The recognizer is used to align textual lyrics to vocals in polyphonic music, obtaining an average error of 0.94 seconds for line-level alignment. A query-by-singing retrieval application based on the recognized words is also constructed; in 57% of the cases, the first retrieved song is the correct one.