Fundamentals of speech recognition
Fundamentals of speech recognition
Musical instrument recognition using cepstral coefficients and temporal features
ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
Information Retrieval for Music and Motion
Information Retrieval for Music and Motion
Speeded-Up Robust Features (SURF)
Computer Vision and Image Understanding
Content Based Copy Detection with Coarse Audio-Visual Fingerprints
CBMI '09 Proceedings of the 2009 Seventh International Workshop on Content-Based Multimedia Indexing
Introduction To Digital Signal Processing: Computer Musically Speaking
Introduction To Digital Signal Processing: Computer Musically Speaking
A framework for video forensics based on local and temporal fingerprints
ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
A Simple but Effective Approach to Video Copy Detection
CRV '10 Proceedings of the 2010 Canadian Conference on Computer and Robot Vision
Efficient and Robust Detection of Duplicate Videos in a Large Database
IEEE Transactions on Circuits and Systems for Video Technology
Frame Fusion for Video Copy Detection
IEEE Transactions on Circuits and Systems for Video Technology
Hi-index | 0.08 |
Fighting movie piracy requires copy detection followed by the accurate frame alignments of master and copy videos, in order to estimate distortion model and capture location in a theater. Existing research on pirate video registration utilizes only visual features for aligning pirate and master videos, while no effort is made to employ acoustic features. Further, most studies in illegal video registration concentrate on the alignment of watermarked videos, while few attempts are made to address the alignment of non-watermarked sequences. We attempt to solve these issues, by proposing a novel spatio-temporal registration framework that utilizes content-based multimodal features for frame alignments. The proposed scheme includes three stages: first, a video sequence is compactly represented using Speeded Up Robust Features (SURF) and audio spectral signatures; second, sliding window based dynamic time warping (DTW) is employed to compute temporal frame alignments; third, robust SURF descriptors are utilized to generate accurate geometric frame alignments. The results of experiments on three different datasets demonstrate the robustness and efficiency of the proposed method against various video transformations.