A Computational Approach to Edge Detection
IEEE Transactions on Pattern Analysis and Machine Intelligence
Normalized Cuts and Image Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach
Data Mining and Knowledge Discovery
Robust Real-Time Face Detection
International Journal of Computer Vision
Recognizing Human Actions: A Local SVM Approach
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Multimodal oriented discriminant analysis
ICML '05 Proceedings of the 22nd international conference on Machine learning
Behavior recognition via sparse spatio-temporal features
ICCCN '05 Proceedings of the 14th International Conference on Computer Communications and Networks
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
A survey on vision-based human action recognition
Image and Vision Computing
Volumetric Features for Video Event Detection
International Journal of Computer Vision
Understanding transit scenes: a survey on human behavior-recognition algorithms
IEEE Transactions on Intelligent Transportation Systems
Mining Layered Grammar Rules for Action Recognition
International Journal of Computer Vision
Hi-index | 0.01 |
In this paper we present an instant action recognition method, which is able to recognize an action in real-time from only two continuous video frames. For the sake of instantaneity, we employ two types of computationally efficient but perceptually important features - optical flow and edges - to capture motion and shape characteristics of actions. It is known that the two types of features can be unreliable or ambiguous due to noise and degradation of video quality. In order to endow them with strong discriminative power, we pursue combined features, of which the joint distributions are different in-between action classes. As the low-level visual features are usually densely distributed in video frames, to reduce computational expense and induce a compact structural representation, we propose to first group the learned discriminative joint features into feature groups according to their correlation, then adapt the efficient boosting method as the action recognition engine which take the grouped features as input. Experimental results show that the combination of the two types of features achieves superior performance in differentiating actions than that of using each single type of features alone. The whole model is computationally efficient, and the action recognition accuracy is comparable to the state-of-the-art approaches.