Selection of negative samples and two-stage combination of multiple features for action detection in thousands of videos

Authors:
G. J. Burghouts;K. Schutte;H. Bouma;R. J. Hollander
Affiliations:
Intelligent Imaging, TNO, The Hague, The Netherlands;Intelligent Imaging, TNO, The Hague, The Netherlands;Intelligent Imaging, TNO, The Hague, The Netherlands;Intelligent Imaging, TNO, The Hague, The Netherlands
Venue:
Machine Vision and Applications
Year:
2014

Citing 32
Cited 1

Statistical Pattern Recognition: A Review

IEEE Transactions on Pattern Analysis and Machine Intelligence
The Recognition of Human Movement Using Temporal Templates

IEEE Transactions on Pattern Analysis and Machine Intelligence
Random Forests

Machine Learning
Learning Parameterized Models of Image Motion

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Recognizing Action at a Distance

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Discriminative model fusion for semantic concept detection and annotation in video

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Recognizing Human Actions: A Local SVM Approach

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Semi-Supervised Cross Feature Learning for Semantic Concept Detection in Videos

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
On Space-Time Interest Points

International Journal of Computer Vision
Efficient Shape Matching Using Shape Contexts

IEEE Transactions on Pattern Analysis and Machine Intelligence
Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study

International Journal of Computer Vision
The class imbalance problem: A systematic study

Intelligent Data Analysis
Actions as Space-Time Shapes

IEEE Transactions on Pattern Analysis and Machine Intelligence
Floor Fields for Tracking in High Density Crowd Scenes

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part II
Evaluating Color Descriptors for Object and Scene Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Object Detection with Discriminatively Trained Part-Based Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Object, scene and actions: combining multiple features for human action recognition

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part I
Modeling temporal structure of decomposable motion segments for activity classification

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
An overview of contest on semantic description of human activities (SDHA) 2010

ICPR'10 Proceedings of the 20th International conference on Recognizing patterns in signals, speech, images, and videos
Variations of a hough-voting action recognition system

ICPR'10 Proceedings of the 20th International conference on Recognizing patterns in signals, speech, images, and videos
Social negative bootstrapping for visual categorization

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Consumer video understanding: a benchmark database and an evaluation of human and machine performance

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Discriminative Video Pattern Search for Efficient Action Detection

IEEE Transactions on Pattern Analysis and Machine Intelligence
Multi-class multi-scale stacked sequential learning

MCS'11 Proceedings of the 10th international conference on Multiple classifier systems
The Visual Extent of an Object

International Journal of Computer Vision
Action recognition using context and appearance distribution features

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Learning Sparse Representations for Human Action Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Quasi-periodic spatiotemporal filtering

IEEE Transactions on Image Processing
Action bank: A high-level representation of activity in video

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Human activity prediction: Early recognition of ongoing activities from streaming videos

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Propagative hough voting for human activity recognition

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III
Spatio-temporal layout of human actions for improved bag-of-words action detection

Pattern Recognition Letters

Special issue on Multimedia Event Detection

Machine Vision and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, a system is presented that can detect 48 human actions in realistic videos, ranging from simple actions such as `walk' to complex actions such as `exchange'. We propose a method that gives a major contribution in performance. The reason for this major improvement is related to a different approach on three themes: sample selection, two-stage classification, and the combination of multiple features. First, we show that the sampling can be improved by smart selection of the negatives. Second, we show that exploiting all 48 actions' posteriors by two-stage classification greatly improves its detection. Third, we show how low-level motion and high-level object features should be combined. These three yield a performance improvement of a factor 2.37 for human action detection in the visint.org test set of 1,294 realistic videos. In addition, we demonstrate that selective sampling and the two-stage setup improve on standard bag-of-feature methods on the UT-interaction dataset, and our method outperforms state-of-the-art for the IXMAS dataset.