Object Recognition from Local Scale-Invariant Features
ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
Video Google: A Text Retrieval Approach to Object Matching in Videos
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
International Journal of Computer Vision
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Large Scale Multiple Kernel Learning
The Journal of Machine Learning Research
Human Action Recognition by Semilatent Topic Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Object, scene and actions: combining multiple features for human action recognition
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part I
Event detection and recognition for semantic annotation of video
Multimedia Tools and Applications
An overview of contest on semantic description of human activities (SDHA) 2010
ICPR'10 Proceedings of the 20th International conference on Recognizing patterns in signals, speech, images, and videos
Trajectories based descriptor for dynamic events annotation
J-MRE '11 Proceedings of the 2011 joint ACM workshop on Modeling and representing events
Action recognition by dense trajectories
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Hi-index | 0.00 |
Bag-of-Words representation based on trajectory local features and taking into account the spatio-temporal context through static segmentation grids is currently the leading paradigm to perform action annotation.While providing a coarse localization of low-level features, those approaches tend to be limited by the grid rigidity. In this work we propose two contributions on trajectory based signatures. First, we extend a local trajectory feature to characterize the acceleration in videos, leading to invariance to camera constant motion. We also introduce two new adaptive segmentation grids, namely Adaptive Grid (AG) and Deformable Adaptive Grid (DAG). AG is learnt from videos data, to fit a given dataset and overcome static grid rigidity. DAG is also learnt from video data. Moreover, it can be adapted to a specific video through a deformation operation. Our adaptive grids are then exploited by a Bag-of-Words model at the aggregation step for action recognition. Our proposal is evaluated on 4 publicly available datasets.