Bimodal fusion of low-level visual features and high-level semantic features for near-duplicate video clip detection

Authors:
Hyun-seok Min;Jae Young Choi;Wesley De Neve;Yong Man Ro
Affiliations:
Image and Video Systems Lab, Department of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 305-732, Republic of Korea;Image and Video Systems Lab, Department of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 305-732, Republic of Korea;Image and Video Systems Lab, Department of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 305-732, Republic of Korea;Image and Video Systems Lab, Department of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 305-732, Republic of Korea
Venue:
Image Communication
Year:
2011

Citing 37
Cited 1

Sparse matrices in matlab: design and implementation

SIAM Journal on Matrix Analysis and Applications
Detection of video sequences using compact signatures

ACM Transactions on Information Systems (TOIS)
Local Behaviours Labelling for Content Based Video Copy Detection

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 03
Accelerating sparse matrix computations via data compression

Proceedings of the 20th annual international conference on Supercomputing
Supervised Learning of Semantic Classes for Image Annotation and Retrieval

IEEE Transactions on Pattern Analysis and Machine Intelligence
Near-duplicate keyframe retrieval with visual keywords and semantic context

Proceedings of the 6th ACM international conference on Image and video retrieval
Video copy detection: a comparative study

Proceedings of the 6th ACM international conference on Image and video retrieval
Practical elimination of near-duplicates from web video search

Proceedings of the 15th international conference on Multimedia
Content based video matching using spatiotemporal volumes

Computer Vision and Image Understanding
(Un)Reliability of video concept detection

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Video sequence matching based on temporal ordinal measurement

Pattern Recognition Letters
Near-duplicate keyframe retrieval by nonrigid image matching

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Scalable mining of large video databases using copy detection

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Semantic video fingerprinting and retrieval using face information

Image Communication
A compact, effective descriptor for video copy detection

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Unified video annotation via multigraph learning

IEEE Transactions on Circuits and Systems for Video Technology
Video copy detection by fast sequence matching

Proceedings of the ACM International Conference on Image and Video Retrieval
Real-time near-duplicate elimination for web video search with content and context

IEEE Transactions on Multimedia - Special issue on integration of context and content
An efficient near-duplicate video shot detection method using shot-based interest points

IEEE Transactions on Multimedia
Near-Duplicate Video Detection Using Temporal Patterns of Semantic Concepts

ISM '09 Proceedings of the 2009 11th IEEE International Symposium on Multimedia
Video identification using video tomography

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Scaling content-based video copy detection to very large databases

Multimedia Tools and Applications
Detecting duplicate video based on camera transitional behavior

ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
Looking at near-duplicate videos from a human-centric perspective

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Visual-Concept Search Solved?

Computer
Video copy detection using multiple visual cues and MPEG-7 descriptors

Journal of Visual Communication and Image Representation
Content-Based Copy Retrieval Using Distortion-Based Probabilistic Similarity Search

IEEE Transactions on Multimedia
Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? A Case Study With Broadcast News

IEEE Transactions on Multimedia
Near-Duplicate Keyframe Identification With Interest Point Matching and Pattern Learning

IEEE Transactions on Multimedia
On the Annotation of Web Videos by Efficient Near-Duplicate Search

IEEE Transactions on Multimedia
The MPEG-7 visual standard for content description-an overview

IEEE Transactions on Circuits and Systems for Video Technology
Color and texture descriptors

IEEE Transactions on Circuits and Systems for Video Technology
Spatiotemporal sequence matching for efficient video copy detection

IEEE Transactions on Circuits and Systems for Video Technology
Semantic Home Photo Categorization

IEEE Transactions on Circuits and Systems for Video Technology
A Framework for Handling Spatiotemporal Variations in Video Copy Detection

IEEE Transactions on Circuits and Systems for Video Technology
Support vector machines for histogram-based image classification

IEEE Transactions on Neural Networks
An introduction to kernel-based learning algorithms

IEEE Transactions on Neural Networks

What fresh media are you looking for?: retrieving media items from multiple social networks

Proceedings of the 2012 international workshop on Socially-aware multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

The detection of near-duplicate video clips (NDVCs) is an area of current research interest and intense development. Most NDVC detection methods represent video clips with a unique set of low-level visual features, typically describing color or texture information. However, low-level visual features are sensitive to transformations of the video content. Given the observation that transformations tend to preserve the semantic information conveyed by the video content, we propose a novel approach for identifying NDVCs, making use of both low-level visual features (this is, MPEG-7 visual features) and high-level semantic features (this is, 32 semantic concepts detected using trained classifiers). Experimental results obtained for the publicly available MUSCLE-VCD-2007 and TRECVID 2008 video sets show that bimodal fusion of visual and semantic features facilitates robust NDVC detection. In particular, the proposed method is able to identify NDVCs with a low missed detection rate (3% on average) and a low false alarm rate (2% on average). In addition, the combined use of visual and semantic features outperforms the separate use of either of them in terms of NDVC detection effectiveness. Further, we demonstrate that the effectiveness of the proposed method is on par with or better than the effectiveness of three state-of-the-art NDVC detection methods either making use of temporal ordinal measurement, features computed using the Scale-Invariant Feature Transform (SIFT), or bag-of-visual-words (BoVW). We also show that the influence of the effectiveness of semantic concept detection on the effectiveness of NDVC detection is limited, as long as the mean average precision (MAP) of the semantic concept detectors used is higher than 0.3. Finally, we illustrate that the computational complexity of our NDVC detection method is competitive with the computational complexity of the three aforementioned NDVC detection methods.