A selective spatio-temporal interest point detector for human action recognition in complex scenes

Authors:
Bhaskar Chakraborty;Michael B. Holte;Thomas B. Moeslund;Jordi Gonzalez;F. Xavier Roca
Affiliations:
Computer Vision Center, Universitat Autònoma de Barcelona, Catalonia (Spain);Computer Vision and Media Technology Laboratory, Aalborg Universtity, Denmark;Computer Vision and Media Technology Laboratory, Aalborg Universtity, Denmark;Computer Vision Center, Universitat Autònoma de Barcelona, Catalonia (Spain);Computer Vision Center, Universitat Autònoma de Barcelona, Catalonia (Spain)
Venue:
ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Year:
2011

Citing 0
Cited 3

Script data for attribute-based recognition of composite activities

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part I
Human action recognition with salient trajectories

Signal Processing
Large scale continuous visual event recognition using max-margin Hough transformation framework

Computer Vision and Image Understanding

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent progress in the field of human action recognition points towards the use of Spatio-Temporal Interest Points (STIPs) for local descriptor-based recognition strategies. In this paper we present a new approach for STIP detection by applying surround suppression combined with local and temporal constraints. Our method is significantly different from existing STIP detectors and improves the performance by detecting more repeatable, stable and distinctive STIPs for human actors, while suppressing unwanted background STIPs. For action representation we use a bag-of-visual words (BoV) model of local N-jet features to build a vocabulary of visual-words. To this end, we introduce a novel vocabulary building strategy by combining spatial pyramid and vocabulary compression techniques, resulting in improved performance and efficiency. Action class specific Support Vector Machine (SVM) classifiers are trained for categorization of human actions. A comprehensive set of experiments on existing benchmark datasets, and more challenging datasets of complex scenes, validate our approach and show state-of-the-art performance.