Discriminative Video Pattern Search for Efficient Action Detection

Authors:
Junsong Yuan;Zicheng Liu;Ying Wu
Affiliations:
Nanyang Technological University, Singapore;Microsoft Research, Redmond;Northwestern University, Evanston
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2011

Citing 0
Cited 10

Selective spatio-temporal interest points

Computer Vision and Image Understanding
Propagative hough voting for human activity recognition

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III
Unsupervised temporal commonality discovery

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part IV
Learning latent spatio-temporal compositional model for human action recognition

Proceedings of the 21st ACM international conference on Multimedia
Knowledge representation, learning, and problem solving for general intelligence

AGI'13 Proceedings of the 6th international conference on Artificial General Intelligence
Editor's Choice Article: Human activity recognition in videos using a single example

Image and Vision Computing
Efficient descriptor tree growing for fast action recognition

Pattern Recognition Letters
Selection of negative samples and two-stage combination of multiple features for action detection in thousands of videos

Machine Vision and Applications
Multimedia Event Detection Using Segment-Based Approach for Motion Feature

Journal of Signal Processing Systems
Coloring Action Recognition in Still Images

International Journal of Computer Vision

Quantified Score

Hi-index	0.15

Visualization

Abstract

Actions are spatiotemporal patterns. Similar to the sliding window-based object detection, action detection finds the reoccurrences of such spatiotemporal patterns through pattern matching, by handling cluttered and dynamic backgrounds and other types of action variations. We address two critical issues in pattern matching-based action detection: 1) the intrapattern variations in actions, and 2) the computational efficiency in performing action pattern search in cluttered scenes. First, we propose a discriminative pattern matching criterion for action classification, called naive Bayes mutual information maximization (NBMIM). Each action is characterized by a collection of spatiotemporal invariant features and we match it with an action class by measuring the mutual information between them. Based on this matching criterion, action detection is to localize a subvolume in the volumetric video space that has the maximum mutual information toward a specific action class. A novel spatiotemporal branch-and-bound (STBB) search algorithm is designed to efficiently find the optimal solution. Our proposed action detection method does not rely on the results of human detection, tracking, or background subtraction. It can handle action variations such as performing speed and style variations as well as scale changes well. It is also insensitive to dynamic and cluttered backgrounds and even to partial occlusions. The cross-data set experiments on action detection, including KTH, CMU action data sets, and another new MSR action data set, demonstrate the effectiveness and efficiency of the proposed multiclass multiple-instance action detection method.