Waveprint: Efficient wavelet-based audio fingerprinting
Pattern Recognition
Learning to hash: forgiving hash functions and applications
Data Mining and Knowledge Discovery
A filter-based approach to detect end-of-utterances from prosody in dialog systems
NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Learning "forgiving" hash functions: algorithms and large scale tests
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Robust video fingerprinting based on symmetric pairwise boosting
IEEE Transactions on Circuits and Systems for Video Technology
Pairwise boosted audio fingerprint
IEEE Transactions on Information Forensics and Security - Special issue on electronic voting
Efficient and robust music identification with weighted finite-state transducers
IEEE Transactions on Audio, Speech, and Language Processing
A novel audio fingerprinting method robust to time scale modification and pitch shifting
Proceedings of the international conference on Multimedia
Music identification via vocabulary tree with MFCC peaks
MIRUM '11 Proceedings of the 1st international ACM workshop on Music information retrieval with user-centered and multimodal strategies
CRIM's content-based audio copy detection system for TRECVID 2009
Multimedia Tools and Applications
Audio-Based copy detection in the large-scale internet videos
PCM'12 Proceedings of the 13th Pacific-Rim conference on Advances in Multimedia Information Processing
Hi-index | 0.00 |
We describe how certain tasks in the audio domain can be effectively addressed using computer vision approaches. This paper focuses on the problem of music identification, where the goal is to reliably identify a song given a few seconds of noisy audio. Our approach treats the spectrogram of each music clip as a 2-D image and transforms music identification into a corrupted sub-image retrieval problem. By employing pairwise boosting on a large set of Viola-Jones features, our system learns compact, discriminative, local descriptors that are amenable to efficient indexing. During the query phase, we retrieve the set of song snippets that locally match the noisy sample and employ geometric verification in conjunction with an EM-based "occlusion" model to identify the song that is most consistent with the observed signal. We have implemented our algorithm in a practical system that can quickly and accurately recognize music from short audio samples in the presence of distortions such as poor recording quality andsignificant ambient noise. Our experiments demonstrate that this approach significantly outperforms the current state-of-the-art in content-based music identification.