Multimodal fusion for video copy detection

Authors:
Xavier Anguera;Juan Manuel Barrios;Tomasz Adamek;Nuria Oliver
Affiliations:
Telefonica Research, Barcelona, Spain;University of Chile, Santiago de Chile, Chile;Telefoncia Research, Barcelona, Spain;Telefonica Research, Barcelona, Spain
Venue:
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Year:
2011

Citing 4
Cited 1

Evaluation campaigns and TRECVid

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Content Based Copy Detection with Coarse Audio-Visual Fingerprints

CBMI '09 Proceedings of the 2009 Seventh International Workshop on Content-Based Multimedia Indexing
Content-Based Copy Retrieval Using Distortion-Based Probabilistic Similarity Search

IEEE Transactions on Multimedia
Color and texture descriptors

IEEE Transactions on Circuits and Systems for Video Technology

Rotation and flipping robust region binary patterns for video copy detection

Journal of Visual Communication and Image Representation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Content-based video copy detection algorithms (CBCD) focus on detecting video segments that are identical or transformed versions of segments in a known video. In recent years some systems have proposed the combination of orthogonal modalities (e.g. derived from audio and video) to improve detection performance, although not always achieving consistent results. In this paper we propose a fusion algorithm that is able to combine as many modalities as available at the decision level. The algorithm is based on the weighted sum of the normalized scores, which are modified depending on how well they rank in each modality. This leads to a virtually parameter-free fusion algorithm. We performed several tests using 2010 TRECVID VCD datasets and obtain up to 46% relative improvement in min-NDCR while also improving the F1 metric on the fused results in comparison to just using the best single modality.