High-Speed Action Recognition and Localization in Compressed Domain Videos

Authors:
Chuohao Yeo;P. Ahammad;K. Ramchandran;S. S. Sastry
Affiliations:
Dept. of Electr. Eng. & Comput. Sci., California Univ., Berkeley, CA;-;-;-
Venue:
IEEE Transactions on Circuits and Systems for Video Technology
Year:
2008

Citing 0
Cited 6

Fast Compressed Domain Motion Detection in H.264 Video Streams for Video Surveillance Applications

AVSS '09 Proceedings of the 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance
Motion- and location-based online human daily activity recognition

Pervasive and Mobile Computing
Moving object segmentation in the h.264 compressed domain

ACCV'09 Proceedings of the 9th Asian conference on Computer Vision - Volume Part II
Human action recognition and localization in video at contextual level

ICDEM'10 Proceedings of the Second international conference on Data Engineering and Management
Modeling dominance effects on nonverbal behaviors using granger causality

Proceedings of the 14th ACM international conference on Multimodal interaction
Exploratory search of long surveillance videos

Proceedings of the 20th ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a compressed domain scheme that is able to recognize and localize actions at high speeds. The recognition problem is posed as performing an action video query on a test video sequence. Our method is based on computing motion similarity using compressed domain features which can be extracted with low complexity. We introduce a novel motion correlation measure that takes into account differences in motion directions and magnitudes. Our method is appearance-invariant, requires no prior segmentation, alignment or stabilization, and is able to localize actions in both space and time. We evaluated our method on a benchmark action video database consisting of six actions performed by 25 people under three different scenarios. Our proposed method achieved a classification accuracy of 90%, comparing favorably with existing methods in action classification accuracy, and is able to localize a template video of 80 x 64 pixels with 23 frames in a test video of 368 x 184 pixels with 835 frames in just 11 s, easily outperforming other methods in localization speed. We also perform a systematic investigation of the effects of various encoding options on our proposed approach. In particular, we present results on the compression-classification tradeoff, which would provide valuable insight into jointly designing a system that performs video encoding at the camera front-end and action classification at the processing back-end.