Selective spatio-temporal interest points

Authors:
Bhaskar Chakraborty;Michael B. Holte;Thomas B. Moeslund;Jordi Gonzílez
Affiliations:
Computer Vision Center (CVC), Department of Computer Science (UAB), Edifici O, Campus UAB, 08193 Bellaterra, Spain;Department of Architecture, Design and Media Technology, Aalborg Universtity, DK-9220 Aalborg, Denmark;Department of Architecture, Design and Media Technology, Aalborg Universtity, DK-9220 Aalborg, Denmark;Computer Vision Center (CVC), Department of Computer Science (UAB), Edifici O, Campus UAB, 08193 Bellaterra, Spain
Venue:
Computer Vision and Image Understanding
Year:
2012

Citing 30
Cited 4

Representation of local geometry in the visual system

Biological Cybernetics
Feature Detection with Automatic Scale Selection

International Journal of Computer Vision
Feature tracking with automatic selection of spatial scales

Computer Vision and Image Understanding
Evaluation of Interest Point Detectors

International Journal of Computer Vision - Special issue on a special section on visual surveillance
Use of the Hough transformation to detect lines and curves in pictures

Communications of the ACM
Learning variable-length Markov models of behavior

Computer Vision and Image Understanding - Modeling people toward vision-based underatanding of a person's shape, appearance, and movement
Space-time Interest Points

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Learning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Models

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
On Space-Time Interest Points

International Journal of Computer Vision
Detecting Irregularities in Images and in Video

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Statistical Analysis of Dynamic Actions

IEEE Transactions on Pattern Analysis and Machine Intelligence
A survey of advances in vision-based human motion capture and analysis

Computer Vision and Image Understanding - Special issue on modeling people: Vision-based understanding of a person's shape, appearance, movement, and behaviour
A 3-dimensional sift descriptor and its application to action recognition

Proceedings of the 15th international conference on Multimedia
Actions as Space-Time Shapes

IEEE Transactions on Pattern Analysis and Machine Intelligence
Local velocity-adapted motion events for spatio-temporal recognition

Computer Vision and Image Understanding
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words

International Journal of Computer Vision
An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part II
Discriminant Saliency, the Detection of Suspicious Coincidences, and Applications to Visual Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Spatiotemporal Saliency in Dynamic Scenes

IEEE Transactions on Pattern Analysis and Machine Intelligence
A survey on vision-based human action recognition

Image and Vision Computing
Discriminative spatial attention for robust tracking

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part I
Representing pairwise spatial and temporal relations for action recognition

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part I
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Action Recognition Using Mined Hierarchical Compound Features

IEEE Transactions on Pattern Analysis and Machine Intelligence
Discriminative Video Pattern Search for Efficient Action Detection

IEEE Transactions on Pattern Analysis and Machine Intelligence
SURF: speeded up robust features

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I
Local descriptors for spatio-temporal recognition

SCVMA'04 Proceedings of the First international conference on Spatial Coherence for Visual Motion Analysis
Spatiotemporal salient points for visual recognition of human actions

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A tree-based approach to integrated action localization, recognition and segmentation

ECCV'10 Proceedings of the 11th European conference on Trends and Topics in Computer Vision - Volume Part I

An on-line, real-time learning method for detecting anomalies in videos using spatio-temporal compositions

Computer Vision and Image Understanding
Exploring STIP-based models for recognizing human interactions in TV videos

Pattern Recognition Letters
Editor's Choice Article: Human activity recognition in videos using a single example

Image and Vision Computing
Graph-based approach for human action recognition using spatio-temporal features

Journal of Visual Communication and Image Representation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent progress in the field of human action recognition points towards the use of Spatio-Temporal Interest Points (STIPs) for local descriptor-based recognition strategies. In this paper, we present a novel approach for robust and selective STIP detection, by applying surround suppression combined with local and temporal constraints. This new method is significantly different from existing STIP detection techniques and improves the performance by detecting more repeatable, stable and distinctive STIPs for human actors, while suppressing unwanted background STIPs. For action representation we use a bag-of-video words (BoV) model of local N-jet features to build a vocabulary of visual-words. To this end, we introduce a novel vocabulary building strategy by combining spatial pyramid and vocabulary compression techniques, resulting in improved performance and efficiency. Action class specific Support Vector Machine (SVM) classifiers are trained for categorization of human actions. A comprehensive set of experiments on popular benchmark datasets (KTH and Weizmann), more challenging datasets of complex scenes with background clutter and camera motion (CVC and CMU), movie and YouTube video clips (Hollywood 2 and YouTube), and complex scenes with multiple actors (MSR I and Multi-KTH), validates our approach and show state-of-the-art performance. Due to the unavailability of ground truth action annotation data for the Multi-KTH dataset, we introduce an actor specific spatio-temporal clustering of STIPs to address the problem of automatic action annotation of multiple simultaneous actors. Additionally, we perform cross-data action recognition by training on source datasets (KTH and Weizmann) and testing on completely different and more challenging target datasets (CVC, CMU, MSR I and Multi-KTH). This documents the robustness of our proposed approach in the realistic scenario, using separate training and test datasets, which in general has been a shortcoming in the performance evaluation of human action recognition techniques.