A decision-theoretic generalization of on-line learning and an application to boosting
Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Support Vector Machines: Training and Applications
Support Vector Machines: Training and Applications
Video Google: A Text Retrieval Approach to Object Matching in Videos
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Recognizing Human Actions: A Local SVM Approach
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Histograms of Oriented Gradients for Human Detection
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
International Journal of Computer Vision
Free viewpoint action recognition using motion history volumes
Computer Vision and Image Understanding - Special issue on modeling people: Vision-based understanding of a person's shape, appearance, movement, and behaviour
IEEE Transactions on Pattern Analysis and Machine Intelligence
Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment
IEEE Transactions on Pattern Analysis and Machine Intelligence
Unsupervised Object Discovery: A Comparison
International Journal of Computer Vision
An overview of contest on semantic description of human activities (SDHA) 2010
ICPR'10 Proceedings of the 20th International conference on Recognizing patterns in signals, speech, images, and videos
Action Recognition Using Mined Hierarchical Compound Features
IEEE Transactions on Pattern Analysis and Machine Intelligence
Understanding interactions and guiding visual surveillance by tracking attention
ACCV'10 Proceedings of the 2010 international conference on Computer vision - Volume Part I
Selective spatio-temporal interest points
Computer Vision and Image Understanding
Efficient Additive Kernels via Explicit Feature Maps
IEEE Transactions on Pattern Analysis and Machine Intelligence
Human detection using oriented histograms of flow and appearance
ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part II
Action recognition by dense trajectories
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Machine Recognition of Human Activities: A Survey
IEEE Transactions on Circuits and Systems for Video Technology
Human action recognition by learning bases of action attributes and parts
ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Structured Learning of Human Interactions in TV Shows
IEEE Transactions on Pattern Analysis and Machine Intelligence
Hi-index | 0.10 |
Human motion recognition - action (HAR) or interaction (HIR) - in real video data is identified as a very challenging task. In the last few years models of increasing complexity have been proposed in order to improve the performance in the task. However, it still remains unclear whether it is the features or the models what deserves the increase in complexity. In this paper an evaluation of such problem is carried out in the HIR task. For that purpose, we compare the results obtained in our experiments - by using STIP-based features and BOW models as basis and combined with a standard classifier - with some of the more effective and recent approaches that use alternative representation models. We perform a comprehensive experimental study on two state-of-the-art databases in HIR: TV Human interactions and UT-interactions. We compare the results of our experiments with recent results published on these datasets. In addition, we run cross-data experiments on Hollywood-2 dataset in order to study the capability of generalization of the trained models through different datasets. The most relevant result is that the model combining STIP+BOW is competitive in the HIR task in comparison with the most complex ones. It is also shown that the vocabulary learning subtask can be improved by using compression algorithms on large enough initial set of features. In contrast to other categorization tasks the context does not help, the results show that dense sampling of STIP is the best choice, but only when it is used inside the region of interest of the interaction.