Multimodal content-based structure analysis of karaoke music

Authors:
Yongwei Zhu;Kai Chen;Qibin Sun
Affiliations:
Institute for Infocomm Research, Singapore;Institute for Infocomm Research, Singapore;Institute for Infocomm Research, Singapore
Venue:
Proceedings of the 13th annual ACM international conference on Multimedia
Year:
2005

Citing 9
Cited 3

Automatic text segmentation and text recognition for video indexing

Multimedia Systems
Video OCR: indexing digital new libraries by recognition of superimposed captions

Multimedia Systems - Special section on video libraries
Automated extraction of music snippets

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Music thumbnailing via structural analysis

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Content-based music structure analysis with applications to music semantics understanding

Proceedings of the 12th annual ACM international conference on Multimedia
LyricAlly: automatic synchronization of acoustic musical signals and textual lyrics

Proceedings of the 12th annual ACM international conference on Multimedia
MDC: A Software Tool for Developing MPEG Applications

ICMCS '99 Proceedings of the IEEE International Conference on Multimedia Computing and Systems - Volume 2
Automatic text detection and tracking in digital video

IEEE Transactions on Image Processing
Localizing and segmenting text in images and videos

IEEE Transactions on Circuits and Systems for Video Technology

Syllabic level automatic synchronization of music signals and text lyrics

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Combination of audio and lyrics features for genre classification in digital audio collections

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Enabling multiparty karaoke over Internet based on low-level computers: practice and experiment

International Journal of Communication Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a novel approach for content-based analysis of karaoke music, which utilizes multimodal contents including synchronized lyrics text from the video channel and original singing audio as well as accompaniment audio in the two audio channels. We proposed a novel video text extraction technique to accurately segment the bitmaps of lyrics text from the video frames and track the time of its color changes that are synchronized to the music. A technique that characterizes the original singing voice by analyzing the volume balance between the two audio channels is also proposed. A novel music structure analysis method using lyrics text and audio content is then proposed to precisely identify the verses and choruses of a song, and segment the lyrics into singing phrases. Experimental results based on 20 karaoke music titles of difference languages have shown that our proposed video text extraction technique can detect and segment the lyrics texts with accuracy higher than 90%, and the proposed multimodal approach for music structure analysis method has better performance than the previous methods that are based only on audio content analysis.