Language-motivated approaches to action recognition

Authors:
Manavender R. Malgireddy;Ifeoma Nwogu;Venu Govindaraju
Affiliations:
Department of Computer Science and Engineering, University at Buffalo, SUNY, Buffalo, NY;Department of Computer Science and Engineering, University at Buffalo, SUNY, Buffalo, NY;Department of Computer Science and Engineering, University at Buffalo, SUNY, Buffalo, NY
Venue:
The Journal of Machine Learning Research
Year:
2013

Citing 21
Cited 0

A Bayesian Computer Vision System for Modeling Human Interactions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Layered Representations for Human Activity Recognition

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Latent dirichlet allocation

The Journal of Machine Learning Research
Space-time Interest Points

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Recognition of Group Activities using Dynamic Probabilistic Networks

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Recognizing Human Actions: A Local SVM Approach

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Learning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Models

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
On Space-Time Interest Points

International Journal of Computer Vision
Behavior recognition via sparse spatio-temporal features

ICCCN '05 Proceedings of the 14th International Conference on Computer Communications and Networks
Coupled Hidden Semi Markov Models for Activity Recognition

WMVC '07 Proceedings of the IEEE Workshop on Motion and Video Computing
A 3-dimensional sift descriptor and its application to action recognition

Proceedings of the 15th international conference on Multimedia
An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part II
Representing pairwise spatial and temporal relations for action recognition

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part I
Action Recognition Using Direction Models of Motion

ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
Human activity analysis: A review

ACM Computing Surveys (CSUR)
Action recognition: A region based approach

WACV '11 Proceedings of the 2011 IEEE Workshop on Applications of Computer Vision (WACV)
Action Recognition Using Mined Hierarchical Compound Features

IEEE Transactions on Pattern Analysis and Machine Intelligence
Sampling strategies for bag-of-features image classification

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part IV
Action recognition by dense trajectories

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
HMDB: A large video database for human motion recognition

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present language-motivated approaches to detecting, localizing and classifying activities and gestures in videos. In order to obtain statistical insight into the underlying patterns of motions in activities, we develop a dynamic, hierarchical Bayesian model which connects low-level visual features in videos with poses, motion patterns and classes of activities. This process is somewhat analogous to the method of detecting topics or categories from documents based on the word content of the documents, except that our documents are dynamic. The proposed generative model harnesses both the temporal ordering power of dynamic Bayesian networks such as hidden Markov models (HMMs) and the automatic clustering power of hierarchical Bayesian models such as the latent Dirichlet allocation (LDA) model. We also introduce a probabilistic framework for detecting and localizing pre-specified activities (or gestures) in a video sequence, analogous to the use of filler models for keyword detection in speech processing. We demonstrate the robustness of our classification model and our spotting framework by recognizing activities in unconstrained real-life video sequences and by spotting gestures via a one-shot-learning approach.