Spatio-temporal SIFT and its application to human action classification

Authors:
Manal Al Ghamdi;Lei Zhang;Yoshihiko Gotoh
Affiliations:
University of Sheffield, UK;Harbin Engineering University, PRC;University of Sheffield, UK
Venue:
ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part I
Year:
2012

Citing 11
Cited 0

Space-time Interest Points

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Recognizing Human Actions: A Local SVM Approach

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
A 3-dimensional sift descriptor and its application to action recognition

Proceedings of the 15th international conference on Multimedia
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words

International Journal of Computer Vision
Action recognition in unconstrained amateur videos

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Spatio-Temporal Frames in a Bag-of-Visual-Features Approach for Human Actions Recognition

SIBGRAPI '09 Proceedings of the 2009 XXII Brazilian Symposium on Computer Graphics and Image Processing
Space-variant spatio-temporal filtering of video for gaze visualization and perceptual learning

Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications
Feature detector and descriptor evaluation in human action recognition

Proceedings of the ACM International Conference on Image and Video Retrieval
Vlfeat: an open and portable library of computer vision algorithms

Proceedings of the international conference on Multimedia
Interpolative multiresolution coding of advance television with compatible subchannels

IEEE Transactions on Circuits and Systems for Video Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a space-time extension of scale-invariant feature transform (SIFT) originally applied to the 2-dimensional (2D) volumetric images. Most of the previous extensions dealt with 3-dimensional (3D) spacial information using a combination of a 2D detector and a 3D descriptor for applications such as medical image analysis. In this work we build a spatio-temporal difference-of-Gaussian (DoG) pyramid to detect the local extrema, aiming at processing video streams. Interest points are extracted not only from the spatial plane (xy) but also from the planes along the time axis (xt and yt). The space-time extension was evaluated using the human action classification task. Experiments with the KTH and the UCF sports datasets show that the approach was able to produce results comparable to the state-of-the-arts.