The Earth Mover's Distance as a Metric for Image Retrieval
International Journal of Computer Vision
A Metric for Distributions with Applications to Image Databases
ICCV '98 Proceedings of the Sixth International Conference on Computer Vision
Recognizing Human Actions: A Local SVM Approach
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
A Performance Evaluation of Local Descriptors
IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient Visual Event Detection Using Volumetric Features
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Behavior recognition via sparse spatio-temporal features
ICCCN '05 Proceedings of the 14th International Conference on Computer Communications and Networks
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words
International Journal of Computer Vision
Universal and Adapted Vocabularies for Generic Visual Categorization
IEEE Transactions on Pattern Analysis and Machine Intelligence
Human Motion Characterization Using Spatio-temporal Features
IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part I
PCA-SIFT: a more distinctive representation for local image descriptors
CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Region covariance: a fast descriptor for detection and classification
ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part II
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Sparse coding and dictionary learning for symmetric positive definite matrices: a kernel approach
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II
Hi-index | 0.00 |
This paper presents a new action recognition approach based on local spatio-temporal features. The main contributions of our approach are twofold. First, a new local spatio-temporal feature is proposed to represent the cuboids detected in video sequences. Specifically, the descriptor utilizes the covariance matrix to capture the self-correlation information of the low-level features within each cuboid. Since covariance matrices do not lie on Euclidean space, the Log-Euclidean Riemannian metric is used for distance measure between covariance matrices. Second, the Earth Mover’s Distance (EMD) is used for matching any pair of video sequences. In contrast to the widely used Euclidean distance, EMD achieves more robust performances in matching histograms/distributions with different sizes. Experimental results on two datasets demonstrate the effectiveness of the proposed approach.