Study on the combination of video concept detectors

Authors:
Meng Wang;Xian-Sheng Hua
Affiliations:
Microsoft Research Asia, Beijing, China;Microsoft Research Asia, Beijing, China
Venue:
MM '08 Proceedings of the 16th ACM international conference on Multimedia
Year:
2008

Citing 12
Cited 2

Introduction to algorithms

Introduction to algorithms
Discriminative model fusion for semantic concept detection and annotation in video

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
The combination limit in multimedia retrieval

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Early versus late fusion in semantic video analysis

Proceedings of the 13th annual ACM international conference on Multimedia
Early versus late fusion in semantic video analysis

Proceedings of the 13th annual ACM international conference on Multimedia
The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic video annotation by semi-supervised learning with kernel density estimation

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Probabilistic model supported rank aggregation for the semantic concept detection in video

Proceedings of the 6th ACM international conference on Image and video retrieval
Video diver: generic video indexing with diverse features

Proceedings of the international workshop on Workshop on multimedia information retrieval
Video annotation by graph-based learning with neighborhood similarity

Proceedings of the 15th international conference on Multimedia
AP-based borda voting method for feature extraction in TRECVID-2004

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? A Case Study With Broadcast News

IEEE Transactions on Multimedia

Metric learning with feature decomposition for image categorization

Neurocomputing
Social image annotation via cross-domain subspace learning

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper studies the combination of video concept detectors with a labeled fusion set. We point out that the computational cost of the grid search for fusion weights increases exponentially with the number of detectors, and it is thus infeasible when dealing with a large number of detectors. To avoid the difficulty, we adopt incremental fusion approach, i.e., in each round two detectors are combined and hence only 1-dimensional grid search is needed. We propose a Bottom-Up Incremental Fusion (BUIF) method which keeps selecting the detectors with lowest performance for combination. We conduct experiments on TRECVID benchmark dataset for 39 concepts with 38 detection methods. Ten different fusion strategies are compared, and empirical results have demonstrated the superiority of the proposed incremental fusion approach.