Activity representation with motion hierarchies

Authors:
Adrien Gaidon;Zaid Harchaoui;Cordelia Schmid
Affiliations:
Xerox Research Center Europe, Meylan, France;LEAR Team, INRIA Grenoble Rhône-Alpes, Montbonnot , France 38330;LEAR Team, INRIA Grenoble Rhône-Alpes, Montbonnot , France 38330
Venue:
International Journal of Computer Vision
Year:
2014

Citing 39
Cited 0

Registration of Translated and Rotated Images Using Finite Fourier Transforms

IEEE Transactions on Pattern Analysis and Machine Intelligence
Nonlinear component analysis as a kernel eigenvalue problem

Neural Computation
A Bayesian Computer Vision System for Modeling Human Interactions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Motion Segmentation and Tracking Using Normalized Cuts

ICCV '98 Proceedings of the Sixth International Conference on Computer Vision
Large-Scale Event Detection Using Semi-Hidden Markov Models

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Spectral Grouping Using the Nyström Method

IEEE Transactions on Pattern Analysis and Machine Intelligence
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
On Space-Time Interest Points

International Journal of Computer Vision
Hierarchical Bag of Paths for Kernel Based Shape Classification

SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Stable and Efficient Gaussian Process Calculations

The Journal of Machine Learning Research
Clustering Point Trajectories with Various Life-Spans

CVMP '09 Proceedings of the 2009 Conference for Visual Media Production
Two-frame motion estimation based on polynomial expansion

SCIA'03 Proceedings of the 13th Scandinavian conference on Image analysis
Web-scale k-means clustering

Proceedings of the 19th international conference on World wide web
Object Detection with Discriminatively Trained Part-Based Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Object, scene and actions: combining multiple features for human action recognition

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part I
Representing pairwise spatial and temporal relations for action recognition

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part I
Modeling temporal structure of decomposable motion segments for activity classification

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
Object segmentation by long term analysis of point trajectories

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
Computer Vision: Algorithms and Applications

Computer Vision: Algorithms and Applications
Action Recognition Using Mined Hierarchical Compound Features

IEEE Transactions on Pattern Analysis and Machine Intelligence
Contour Detection and Hierarchical Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Hidden Part Models for Human Action Recognition: Probabilistic versus Max Margin

IEEE Transactions on Pattern Analysis and Machine Intelligence
Recognizing Human Actions by Learning and Matching Shape-Motion Prototype Trees

IEEE Transactions on Pattern Analysis and Machine Intelligence
Human detection using oriented histograms of flow and appearance

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part II
Actom sequence models for efficient action detection

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Track to the future: Spatio-temporal video segmentation with long-range motion cues

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Action bank: A high-level representation of activity in video

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Discovering discriminative action parts from mid-level video representations

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Learning latent temporal structure for complex event detection

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
HMDB: A large video database for human motion recognition

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Learning spatiotemporal graphs of human activities

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Human activities as stochastic kronecker graphs

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II
Propagative hough voting for human activity recognition

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III
Trajectory-Based modeling of human actions with motion reference points

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Motion interchange patterns for action recognition in unconstrained videos

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VI
Space-variant descriptor sampling for action recognition based on saliency and eye movements

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VII
Explicit Modeling of Human-Object Interactions in Realistic Videos

IEEE Transactions on Pattern Analysis and Machine Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Complex activities, e.g. pole vaulting, are composed of a variable number of sub-events connected by complex spatio-temporal relations, whereas simple actions can be represented as sequences of short temporal parts. In this paper, we learn hierarchical representations of activity videos in an unsupervised manner. These hierarchies of mid-level motion components are data-driven decompositions specific to each video. We introduce a spectral divisive clustering algorithm to efficiently extract a hierarchy over a large number of tracklets (i.e. local trajectories). We use this structure to represent a video as an unordered binary tree. We model this tree using nested histograms of local motion features. We provide an efficient positive definite kernel that computes the structural and visual similarity of two hierarchical decompositions by relying on models of their parent---child relations. We present experimental results on four recent challenging benchmarks: the High Five dataset (Patron-Perez et al., High five: recognising human interactions in TV shows, 2010), the Olympics Sports dataset (Niebles et al., Modeling temporal structure of decomposable motion segments for activity classification, 2010), the Hollywood 2 dataset (Marszalek et al., Actions in context, 2009), and the HMDB dataset (Kuehne et al., HMDB: A large video database for human motion recognition, 2011). We show that per-video hierarchies provide additional information for activity recognition. Our approach improves over unstructured activity models, baselines using other motion decomposition algorithms, and the state of the art.