Action recognition based on learnt motion semantic vocabulary

Authors:
Qiong Zhao;Zhiwu Lu;Horace H. S. Ip
Affiliations:
Centre for Innovative Applications of Internet And Multimedia Technologies, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong;Centre for Innovative Applications of Internet And Multimedia Technologies, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong;Centre for Innovative Applications of Internet And Multimedia Technologies, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
Venue:
PCM'10 Proceedings of the 11th Pacific Rim conference on Advances in multimedia information processing: Part I
Year:
2010

Citing 9
Cited 1

Recognizing Human Actions: A Local SVM Approach

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
A Bayesian Hierarchical Model for Learning Natural Scene Categories

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Actions as Space-Time Shapes

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Diffusion Maps and Coarse-Graining: A Unified Framework for Dimensionality Reduction, Graph Partitioning, and Data Set Parameterization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Graph Embedding and Extensions: A General Framework for Dimensionality Reduction

IEEE Transactions on Pattern Analysis and Machine Intelligence
Behavior recognition via sparse spatio-temporal features

ICCCN '05 Proceedings of the 14th International Conference on Computer Communications and Networks
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words

International Journal of Computer Vision
Spatial-Temporal correlatons for unsupervised action classification

WMVC '08 Proceedings of the 2008 IEEE Workshop on Motion and video Computing
Image Categorization Based on a Hierarchical Spatial Markov Model

CAIP '09 Proceedings of the 13th International Conference on Computer Analysis of Images and Patterns

Unsupervised approximate-semantic vocabulary learning for human action and video classification

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a novel contextual spectral embedding (CSE) framework for human action recognition, which automatically learns the high-level features (motion semantic vocabulary) from a large vocabulary of abundant mid-level features (i.e. visual words). Our novelty is to exploit the inter-video context between mid-level features for spectral embedding, while the context is captured by the Pearson product moment correlation between mid-level features instead of Gaussian function computed over the vectors of point-wise information as mid-level feature representation. Our goal is to embed the mid-level features into a semantic low-dimensional space, and learn a much compact semantic vocabulary upon the CSE framework. Experiments on two action datasets demonstrate that our approach can achieve significantly improved results with respect to the state of the arts.