Feature Detection with Automatic Scale Selection
International Journal of Computer Vision
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Recognizing Human Actions: A Local SVM Approach
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Quasiconvex Optimization for Robust Geometric Reconstruction
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
A Comparison of Affine Region Detectors
International Journal of Computer Vision
Kernel-based Recognition of Human Actions Using Spatiotemporal Salient Points
CVPRW '06 Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop
Behavior recognition via sparse spatio-temporal features
ICCCN '05 Proceedings of the 14th International Conference on Computer Communications and Networks
IEEE Transactions on Pattern Analysis and Machine Intelligence
An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector
ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part II
FeEval A Dataset for Evaluation of Spatio-temporal Local Features
ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
View-Independent Action Recognition from Temporal Self-Similarities
IEEE Transactions on Pattern Analysis and Machine Intelligence
Hi-index | 0.00 |
In the last decade, we observed a great interest in evaluation of local visual features in the domain of images. The aim is to provide researchers guidance when selecting the best approaches for new applications and data-sets. Most of the state-of-the-art features have been extended to the temporal domain to allow for video retrieval and categorization using similar techniques to those used for images. However, there is no comprehensive evaluation of these. We provide the first comparative evaluation based on isolated and well defined alterations of video data. We select the three most promising approaches, namely the Harris3D, Hessian3D, and Gabor detectors and the HOG/HOF, SURF3D, and HOG3D descriptors. For the evaluation of the detectors, we measure their repeatability on the challenges treating the videos as 3D volumes. To evaluate the robustness of spatio-temporal descriptors, we propose a principled classification pipeline where the increasingly altered videos build a set of queries. This allows for an in-depth analysis of local detectors and descriptors and their combinations.