The Recognition of Human Movement Using Temporal Templates
IEEE Transactions on Pattern Analysis and Machine Intelligence
Unsupervised learning by probabilistic latent semantic analysis
Machine Learning
Estimating Human Body Configurations Using Shape Context Matching
ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part III
Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL
EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Discovering word senses from text
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Laplacian Eigenmaps for dimensionality reduction and data representation
Neural Computation
Unsupervised Learning of Human Motion
IEEE Transactions on Pattern Analysis and Machine Intelligence
Human Motion Analysis: A Review
NAM '97 Proceedings of the 1997 IEEE Workshop on Motion of Non-Rigid and Articulated Objects (NAM '97)
The Journal of Machine Learning Research
Recognizing Human Actions: A Local SVM Approach
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Actions Sketch: A Novel Action Representation
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Hybrid Models for Human Motion Recognition
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
A Bayesian Hierarchical Model for Learning Natural Scene Categories
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
International Journal of Computer Vision
Discovering Objects and their Localization in Images
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Modeling Scenes with Local Descriptors and Latent Aspects
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Object Categorization by Learned Universal Visual Dictionary
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
View Invariance for Human Action Recognition
International Journal of Computer Vision
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
IEEE Transactions on Pattern Analysis and Machine Intelligence
Free viewpoint action recognition using motion history volumes
Computer Vision and Image Understanding - Special issue on modeling people: Vision-based understanding of a person's shape, appearance, movement, and behaviour
Behavior recognition via sparse spatio-temporal features
ICCCN '05 Proceedings of the 14th International Conference on Computer Communications and Networks
IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning to Recognize Activities from the Wrong View Point
ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I
Localizing Objects with Smart Dictionaries
ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I
Supervised Learning of Quantizer Codebooks by Information Loss Minimization
IEEE Transactions on Pattern Analysis and Machine Intelligence
Recovering human body configurations: combining segmentation and recognition
CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
A survey of vision-based methods for action representation, segmentation and recognition
Computer Vision and Image Understanding
CVPR'03 Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition
ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part IV
Visual cue cluster construction via information bottleneck principle and kernel density estimation
CIVR'05 Proceedings of the 4th international conference on Image and Video Retrieval
Cross-view action recognition via view knowledge transfer
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Motion interchange patterns for action recognition in unconstrained videos
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VI
A survey of video datasets for human action and activity recognition
Computer Vision and Image Understanding
A review of motion analysis methods for human Nonverbal Communication Computing
Image and Vision Computing
Hi-index | 0.00 |
Efficient modeling of actions is critical for recognizing human actions. Recently, bag of video words (BoVW) representation, in which features computed around spatiotemporal interest points are quantized into video words based on their appearance similarity, has been widely and successfully explored. The performance of this representation however, is highly sensitive to two main factors: the granularity, and therefore, the size of vocabulary, and the space in which features and words are clustered, i.e., the distance measure between data points at different levels of the hierarchy. The goal of this paper is to propose a representation and learning framework that addresses both these limitations. We present a principled approach to learning a semantic vocabulary from a large amount of video words using Diffusion Maps embedding. As opposed to flat vocabularies used in traditional methods, we propose to exploit the hierarchical nature of feature vocabularies representative of human actions. Spatiotemporal features computed around interest points in videos form the lowest level of representation. Video words are then obtained by clustering those spatiotemporal features. Each video word is then represented by a vector of Pointwise Mutual Information (PMI) between that video word and training video clips, and is treated as a mid-level feature. At the highest level of the hierarchy, our goal is to further cluster the mid-level features, while exploiting semantically meaningful distance measures between them. We conjecture that the mid-level features produced by similar video sources (action classes) must lie on a certain manifold. To capture the relationship between these features, and retain it during clustering, we propose to use diffusion distance as a measure of similarity between them. The underlying idea is to embed the mid-level features into a lower-dimensional space, so as to construct a compact yet discriminative, high level vocabulary. Unlike some of the supervised vocabulary construction approaches and the unsupervised methods such as pLSA and LDA, Diffusion Maps can capture local relationship between the mid-level features on the manifold. We have tested our approach on diverse datasets and have obtained very promising results.