Unsupervised approximate-semantic vocabulary learning for human action and video classification

Authors:
Qiong Zhao;Horace H. S. Ip
Affiliations:
-;-
Venue:
Pattern Recognition Letters
Year:
2013

Citing 20
Cited 0

Space-time Interest Points

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Recognizing Human Actions: A Local SVM Approach

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
A Bayesian Hierarchical Model for Learning Natural Scene Categories

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Creating Efficient Codebooks for Visual Recognition

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Object Categorization by Learned Universal Visual Dictionary

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Scalable Recognition with a Vocabulary Tree

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Graph Embedding and Extensions: A General Framework for Dimensionality Reduction

IEEE Transactions on Pattern Analysis and Machine Intelligence
Behavior recognition via sparse spatio-temporal features

ICCCN '05 Proceedings of the 14th International Conference on Computer Communications and Networks
A 3-dimensional sift descriptor and its application to action recognition

Proceedings of the 15th international conference on Multimedia
Actions as Space-Time Shapes

IEEE Transactions on Pattern Analysis and Machine Intelligence
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words

International Journal of Computer Vision
Local invariant feature detectors: a survey

Foundations and Trends® in Computer Graphics and Vision
Supervised Learning of Quantizer Codebooks by Information Loss Minimization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Image categorization via robust pLSA

Pattern Recognition Letters
Visual Word Ambiguity

IEEE Transactions on Pattern Analysis and Machine Intelligence
Semantics-preserving bag-of-words models and applications

IEEE Transactions on Image Processing
Action recognition based on learnt motion semantic vocabulary

PCM'10 Proceedings of the 11th Pacific Rim conference on Advances in multimedia information processing: Part I
Adaptive learning codebook for action recognition

Pattern Recognition Letters
Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study

IEEE Transactions on Multimedia

Quantified Score

Hi-index	0.10

Visualization

Abstract

The paper presents a novel unsupervised contextual spectral (CSE) framework for human action and video classification. Similar to textual words, the visual word (a mid-level semantic) representation of an image or video contains a combination of synonymous words which give rise to the ambiguity of the representation. To narrow the semantic gap between visual words (mid-level semantic representation) and high-level semantics, we propose a high level representation called approximate-semantic descriptor. The experimental results show that the proposed approach for visual words disambiguation could improve the subsequent classification performance. In the paper, the approximate-semantic descriptor learning is formulated as a spectral clustering problem, such that semantically associated visual words are placed closely in low-dimensional semantic space and then clustered into one approximate-semantic descriptor. Specifically, the high level representation of human action videos is learnt by capturing the inter-video context of mid-level semantics via a non-parametric correlation measure. Experiments on four standard datasets demonstrate that our approach can achieve significantly improved results with respect to the state of the art, particularly for unconstrained environments.