A cross-modal approach for karaoke artifacts correction

Authors:
Wei-Qi Yan;Mohan S. Kankanhalli
Affiliations:
Department of Computer Science, University of California, Irvine, Irvine, USA 62979;National University of Singapore, Singapore, Singapore 117543
Venue:
Multimedia Tools and Applications
Year:
2008

Citing 11
Cited 0

Automatic partitioning of full-motion video

Multimedia Systems
A practical handbook of speech coders

A practical handbook of speech coders
Detection of text captions in compressed domain video

MULTIMEDIA '00 Proceedings of the 2000 ACM workshops on Multimedia
Detection and removal of lighting & shaking artifacts in home videos

Proceedings of the tenth ACM international conference on Multimedia
Editing out Video Editing

IEEE MultiMedia
Music scale modeling for melody matching

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
LyricAlly: automatic synchronization of acoustic musical signals and textual lyrics

Proceedings of the 12th annual ACM international conference on Multimedia
Techniques in Speech Acoustics (Text , Speech & Language Technology)

Techniques in Speech Acoustics (Text , Speech & Language Technology)
Image Processing, Analysis, and Machine Vision

Image Processing, Analysis, and Machine Vision
Experiential Sampling in Multimedia Systems

IEEE Transactions on Multimedia
A spatial-temporal approach for video caption detection and recognition

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Karaoke singing is a popular form of entertainment in several parts of the world. Since this genre of performance attracts amateurs, the singing often has artifacts related to scale, tempo, and synchrony. We have developed an approach to correct these artifacts using cross-modal multimedia streams information. We first perform adaptive sampling on the user's rendition and then use the original singer's rendition as well as the video caption highlighting information in order to correct the pitch, tempo and the loudness. A method of analogies has been employed to perform this correction. The basic idea is to manipulate the user's rendition in a manner to make it as similar as possible to the original singing. A pre-processing step of noise removal due to feedback and huffing also helps improve the quality of the user's audio. The results are described in the paper which shows the effectiveness of this multimedia approach.