Speeding up spatio-temporal sliding-window search for efficient event detection in crowded videos

Authors:
Junsong Yuan;Zicheng Liu;Ying Wu;Zhengyou Zhang
Affiliations:
Northwestern University, Evanston, IL, USA;Microsoft Research, Redmond, WA, USA;Northwestern University, Evanston, IL, USA;Microsoft Research, Redmond, WA, USA
Venue:
EiMM '09 Proceedings of the 1st ACM international workshop on Events in multimedia
Year:
2009

Citing 10
Cited 1

Recognizing Action at a Distance

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Recognizing Human Actions: A Local SVM Approach

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Space-Time Behavior Based Correlation

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
On Space-Time Interest Points

International Journal of Computer Vision
Efficient Visual Event Detection Using Volumetric Features

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Successive Convex Matching for Action Detection

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
A 3-dimensional sift descriptor and its application to action recognition

Proceedings of the 15th international conference on Multimedia
Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment

IEEE Transactions on Pattern Analysis and Machine Intelligence
Real-time human action recognition by luminance field trajectory analysis

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Mining GPS traces and visual words for event classification

MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval

Audio-visual analysis for event understanding

AMC '09 Proceedings of the 2009 workshop on Ambient media computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Despite previous successes of sliding window-based object detection in images, searching desired events in the volumetric video space is still a challenging problem, partially because the pattern search in spatio-temporal video space is much more complicated than that in spatial image space. Without knowing the location, temporal duration, and the spatial scale of the event, the search space for video events is prohibitively large for exhaustive search. To reduce the search complexity, we propose a heuristic branch-and-bound solution for event detection in videos. Unlike existing branch-and-bound method which searches for an optimal subvolume before comparing its detection score against the threshold, we aim at directly finding subvolumes whose scores are higher than the threshold. In doing so, many unnecessary branches are terminated much earlier, thus the search speed can be much faster. To validate this approach, we select three human action classes from the KTH dataset for training while testing with our own action dataset which has clutter and moving backgrounds as well as large variations in lighting, scale, and performing speed of actions. The experiment results show that our technique dramatically reduces computational cost without significantly degrading the quality of the detection results.