Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope
International Journal of Computer Vision
Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns
IEEE Transactions on Pattern Analysis and Machine Intelligence
Scale & Affine Invariant Interest Point Detectors
International Journal of Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
International Journal of Computer Vision
Scalable Recognition with a Vocabulary Tree
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Towards optimal bag-of-features for object categorization and semantic video retrieval
Proceedings of the 6th ACM international conference on Image and video retrieval
Speeded-Up Robust Features (SURF)
Computer Vision and Image Understanding
Randomized Clustering Forests for Image Classification
IEEE Transactions on Pattern Analysis and Machine Intelligence
80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Audio-based semantic concept classification for consumer video
IEEE Transactions on Audio, Speech, and Language Processing
Improving the fisher kernel for large-scale image classification
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Hough transform and 3D SURF for robust three dimensional classification
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part VI
Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Sampling strategies for bag-of-features image classification
ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part IV
Real-Time Visual Concept Classification
IEEE Transactions on Multimedia
A fast video event recognition system and its application to video search
Proceedings of the 20th ACM international conference on Multimedia
Recommendations for video event recognition using concept vocabularies
Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Evaluating multimedia features and fusion for example-based event detection
Machine Vision and Applications
Hi-index | 0.00 |
Event recognition in unconstrained Internet videos has great potential in many applications. State-of-the-art systems usually include modules that need extensive computation, such as the extraction of spatial-temporal interest points, which poses a big challenge for large-scale video processing. This paper presents SUPER, a Speeded UP Event Recognition framework for efficient Internet video analysis. We take a multimodal baseline that has produced strong performance on popular benchmarks, and systematically evaluate each component in terms of both computational cost and contribution to recognition accuracy. We show that, by choosing suitable features, classifiers, and fusion strategies, recognition speed can be greatly improved with minor performance degradation. In addition, we also evaluate how many visual and audio frames are needed for event recognition in Internet videos, a question left unanswered in the literature. Results on a rigorously designed dataset indicate that similar recognition accuracy can be attained using only 14 frames per video on average. We also observe that, different from the visual channel, the soundtracks contains little redundant information for video event recognition. Integrating all the findings, our suggested SUPER framework is 220-fold faster than the baseline approach with merely 3.8% drop in recognition accuracy. It classifies an 80-second video sequence using models of 20 classes in just 4.56 seconds.