An Efficient k-Means Clustering Algorithm: Analysis and Implementation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Recognizing Human Actions: A Local SVM Approach
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Evaluation campaigns and TRECVid
MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
International Journal of Computer Vision
Evaluating bag-of-visual-words representations in scene classification
Proceedings of the international workshop on Workshop on multimedia information retrieval
A 3-dimensional sift descriptor and its application to action recognition
Proceedings of the 15th international conference on Multimedia
Local velocity-adapted motion events for spatio-temporal recognition
Computer Vision and Image Understanding
Robust Object Detection with Interleaved Categorization and Segmentation
International Journal of Computer Vision
Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment
IEEE Transactions on Pattern Analysis and Machine Intelligence
SIFT-Bag kernel for video event analysis
MM '08 Proceedings of the 16th ACM international conference on Multimedia
Video event detection using motion relativity and visual relatedness
MM '08 Proceedings of the 16th ACM international conference on Multimedia
Efficient Subwindow Search: A Branch and Bound Framework for Object Localization
IEEE Transactions on Pattern Analysis and Machine Intelligence
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence
A survey of graph edit distance
Pattern Analysis & Applications
Semantic Image Retrieval Using Region Based Inverted File
DICTA '09 Proceedings of the 2009 Digital Image Computing: Techniques and Applications
Video event classification using string kernels
Multimedia Tools and Applications
A survey on vision-based human action recognition
Image and Vision Computing
A fast cube-based video shot retrieval using 3D moment-preserving technique
ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
Object Detection with Discriminatively Trained Part-Based Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
Event detection and recognition for semantic annotation of video
Multimedia Tools and Applications
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Hi-index | 0.00 |
This paper presents a novel approach to reorder video shots using the state-of-the-art bag-of-words (BoW) approach. The shot reordering approach eliminates the temporal ambiguity which is likely to degrade the performance of conventional video event recognition algorithms using support vector machine (SVM) classifiers with string kernels. A traditional BoW model constructs feature vectors for video frames, regarding the arrangement of the visual words in the 2D image space, to be histograms of visual words which do not consider spatial-temporal information. Our approach first segments the input video clip into a set of video shots where each of them is further divided into multiple three dimensional video patches and cubes. In this paper we present a method to introduce spatial-temporal information into the BoW model by analytically extracting space-time features from individual 3D cubes. The system learns the BoW codebook from these 3D cubes. Every video shot in an input video sequence is represented as a BoW histogram and the corresponding event is then modelled as a sequence of BoW histograms which are further reordered by the proposed normalization scheme. The string kernels for SVM classification are finally adopted to train the SVM classifiers from a set of training samples. These classifiers are used to recognize the event type of a test video clip. Our framework presents a simple and effective way to infuse both temporal and spatial configurations for video events. Results show that the proposed method gives good performance on several publicly available datasets in terms of robustness and recognition rate.