Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Large Scale Multiple Kernel Learning
The Journal of Machine Learning Research
Object Detection with Discriminatively Trained Part-Based Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient object category recognition using classemes
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part I
Double fusion for multimedia event detection
MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? A Case Study With Broadcast News
IEEE Transactions on Multimedia
Recommendations for video event recognition using concept vocabularies
Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Searching informative concept banks for video event detection
Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Evaluating multimedia features and fusion for example-based event detection
Machine Vision and Applications
Hi-index | 0.00 |
While low-level image features have proven to be effective representations for visual recognition tasks such as object recognition and scene classification, they are inadequate to capture complex semantic meaning required to solve high-level visual tasks such as multimedia event detection and recognition. Recognition or retrieval of events and activities can be improved if specific discriminative objects are detected in a video sequence. In this paper, we propose an image representation, called Detection Bank, based on the detection images from a large number of windowed object detectors where an image is represented by different statistics derived from these detections. This representation is extended to video by aggregating the key frame level image representations through mean and max pooling. We empirically show that it captures complementary information to state-of-the-art representations such as Spatial Pyramid Matching and Object Bank. These descriptors combined with our Detection Bank representation significantly outperforms any of the representations alone on TRECVID MED 2011 data.