Middle-Level representation for human activities recognition: the role of spatio-temporal relationships

Authors:
Fei Yuan;Véronique Prinet;Junsong Yuan
Affiliations:
LIAMA & NLPR, CASIA, Chinese Academy of Sciences, Beijing, China;LIAMA & NLPR, CASIA, Chinese Academy of Sciences, Beijing, China;School of EEE, Nanyang Technological University, Singapore
Venue:
ECCV'10 Proceedings of the 11th European conference on Trends and Topics in Computer Vision - Volume Part I
Year:
2010

Citing 13
Cited 3

Scale-Space and Edge Detection Using Anisotropic Diffusion

IEEE Transactions on Pattern Analysis and Machine Intelligence
Monocular perception of biological motion in Johansson displays

Computer Vision and Image Understanding - Modeling people toward vision-based underatanding of a person's shape, appearance, and movement
View-Invariant Representation and Recognition of Actions

International Journal of Computer Vision
Recognizing and Tracking Human Action

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part I
Space-time Interest Points

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Recognizing Action at a Distance

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Efficient Graph-Based Image Segmentation

International Journal of Computer Vision
A Performance Evaluation of Local Descriptors

IEEE Transactions on Pattern Analysis and Machine Intelligence
Actions as Space-Time Shapes

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Cross-View Action Recognition from Temporal Self-similarities

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part II
Recognizing coordinated multi-object activities using a dynamic event ensemble model

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Human Body Articulation for Action Recognition in Video Sequences

AVSS '09 Proceedings of the 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance
Motion Segmentation in the Presence of Outlying, Incomplete, or Corrupted Trajectories

IEEE Transactions on Pattern Analysis and Machine Intelligence

Mid-level features and spatio-temporal context for activity recognition

Pattern Recognition
Predicting human activities using spatio-temporal structure of interest points

Proceedings of the 20th ACM international conference on Multimedia
Propagative hough voting for human activity recognition

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III

Quantified Score

Hi-index	0.00

Visualization

Abstract

We tackle the challenging problem of human activity recognition in realistic video sequences. Unlike local features-based methods or global template-based methods, we propose to represent a video sequence by a set of middle-level parts. A part, or component, has consistent spatial structure and consistent motion. We first segment the visual motion patterns and generate a set of middle-level components by clustering keypoints-based trajectories extracted from the video. To further exploit the interdependencies of the moving parts, we then define spatio-temporal relationships between pairwise components. The resulting descriptive middle-level components and pairwise-components thereby catch the essential motion characteristics of human activities. They also give a very compact representation of the video. We apply our framework on popular and challenging video datasets: Weizmann dataset and UT-Interaction dataset. We demonstrate experimentally that our middle-level representation combined with a χ2-SVM classifier equals to or outperforms the state-of-the-art results on these dataset.