Learning Shape-Classes Using a Mixture of Tree-Unions
IEEE Transactions on Pattern Analysis and Machine Intelligence
International Journal of Computer Vision
Unsupervised Category Modeling, Recognition, and Segmentation in Images
IEEE Transactions on Pattern Analysis and Machine Intelligence
Kronecker Graphs: An Approach to Modeling Networks
The Journal of Machine Learning Research
Modeling temporal structure of decomposable motion segments for activity classification
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
Human activity analysis: A review
ACM Computing Surveys (CSUR)
A probabilistic representation for efficient large scale visual recognition tasks
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Discriminative Latent Models for Recognizing Contextual Group Activities
IEEE Transactions on Pattern Analysis and Machine Intelligence
Action bank: A high-level representation of activity in video
CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Parsing video events with goal inference and intent prediction
ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Learning spatiotemporal graphs of human activities
ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Activity representation with motion hierarchies
International Journal of Computer Vision
Hi-index | 0.00 |
A human activity can be viewed as a space-time repetition of activity primitives. Both instances of the primitives, and their repetition are stochastic. They can be modeled by a generative model-graph, where nodes correspond to the primitives, and the graph's adjacency matrix encodes their affinities for probabilistic grouping into observable video features. When a video of the activity is represented by a graph capturing the space-time layout of video features, such a video graph can be viewed as probabilistically sampled from the activity's model-graph. This sampling is formulated as a successive Kronecker multiplication of the model's affinity matrix. The resulting Kronecker-power matrix is taken as a noisy permutation of the adjacency matrix of the video graph. The paper presents our: 1) model-graph; 2) memory- and time-efficient, weakly supervised learning of activity primitives and their affinities; and 3) inference aimed at finding the best expected correspondences between the primitives and observed video features. Our results demonstrate good scalability on UCF50, and superior performance to that of the state of the art on individual, structured, and collective activities of UCF YouTube, Olympic, and Collective datasets.