Learning spatiotemporal graphs of human activities

Authors:
William Brendel;Sinisa Todorovic
Affiliations:
Oregon State University, Corvallis, 97331, USA;Oregon State University, Corvallis, 97331, USA
Venue:
ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Year:
2011

Citing 0
Cited 16

Modeling complex temporal composition of actionlets for activity prediction

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part I
Human activities as stochastic kronecker graphs

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II
Propagative hough voting for human activity recognition

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III
Spatio-Temporal phrases for activity recognition

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III
Trajectory-Based modeling of human actions with motion reference points

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Weakly supervised learning of object segmentations from web-scale video

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part I
Unsupervised temporal commonality discovery

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part IV
Discriminative prototype selection methods for graph embedding

Pattern Recognition
A comparative study of encoding, pooling and normalization methods for action recognition

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part III
Vector field analysis for multi-object behavior modeling

Image and Vision Computing
Exploring dense trajectory feature and encoding methods for human interaction recognition

Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service
Learning latent spatio-temporal compositional model for human action recognition

Proceedings of the 21st ACM international conference on Multimedia
Analyzing growing plants from 4D point cloud data

ACM Transactions on Graphics (TOG)
Robust action recognition using local motion and group sparsity

Pattern Recognition
Max-Margin Early Event Detectors

International Journal of Computer Vision
Activity representation with motion hierarchies

International Journal of Computer Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

Complex human activities occurring in videos can be defined in terms of temporal configurations of primitive actions. Prior work typically hand-picks the primitives, their total number, and temporal relations (e.g., allow only followed-by), and then only estimates their relative significance for activity recognition. We advance prior work by learning what activity parts and their spatiotemporal relations should be captured to represent the activity, and how relevant they are for enabling efficient inference in realistic videos. We represent videos by spatiotemporal graphs, where nodes correspond to multiscale video segments, and edges capture their hierarchical, temporal, and spatial relationships. Access to video segments is provided by our new, multiscale segmenter. Given a set of training spatiotemporal graphs, we learn their archetype graph, and pdf's associated with model nodes and edges. The model adaptively learns from data relevant video segments and their relations, addressing the "what" and "how." Inference and learning are formulated within the same framework - that of a robust, least-squares optimization - which is invariant to arbitrary permutations of nodes in spatiotemporal graphs. The model is used for parsing new videos in terms of detecting and localizing relevant activity parts. We out-perform the state of the art on benchmark Olympic and UT human-interaction datasets, under a favorable complexity-vs.-accuracy trade-off.