A large-scale benchmark dataset for event recognition in surveillance video

Authors:
Sangmin Oh;A. Hoogs;A. Perera;N. Cuntoor; Chia-Chih Chen; Jong Taek Lee;S. Mukherjee;J. K. Aggarwal; Hyungtae Lee;L. Davis;E. Swears; Xioyang Wang; Qiang Ji;K. Reddy;M. Shah;C. Vondrick;H. Pirsiavash;D. Ramanan;J. Yuen;A. Torralba; Bi Song;A. Fong;A. Roy-Chowdhury;M. Desai
Affiliations:
-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-
Venue:
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Year:
2011

Citing 0
Cited 9

3DPeS: 3D people dataset for surveillance and forensics

J-HGBU '11 Proceedings of the 2011 joint ACM workshop on Human gesture and behavior understanding
Toward a unified framework of motion understanding

Image and Vision Computing
Rule-based high-level situation recognition from incomplete tracking data

RuleML'12 Proceedings of the 6th international conference on Rules on the Web: research and applications
Supporting fuzzy metric temporal logic based situation recognition by mean shift clustering

KI'12 Proceedings of the 35th Annual German conference on Advances in Artificial Intelligence
The role of spatial context in activity recognition

Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing
A survey of video datasets for human action and activity recognition

Computer Vision and Image Understanding
The jiku mobile video dataset

Proceedings of the 4th ACM Multimedia Systems Conference
Modeling multi-object interactions using "string of feature graphs"

Computer Vision and Image Understanding
Large scale continuous visual event recognition using max-margin Hough transformation framework

Computer Vision and Image Understanding

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce a new large-scale video dataset designed to assess the performance of diverse visual event recognition algorithms with a focus on continuous visual event recognition (CVER) in outdoor areas with wide coverage. Previous datasets for action recognition are unrealistic for real-world surveillance because they consist of short clips showing one action by one individual [15, 8]. Datasets have been developed for movies [11] and sports [12], but, these actions and scene conditions do not apply effectively to surveillance videos. Our dataset consists of many outdoor scenes with actions occurring naturally by non-actors in continuously captured videos of the real world. The dataset includes large numbers of instances for 23 event types distributed throughout 29 hours of video. This data is accompanied by detailed annotations which include both moving object tracks and event examples, which will provide solid basis for large-scale evaluation. Additionally, we propose different types of evaluation modes for visual recognition tasks and evaluation metrics along with our preliminary experimental results. We believe that this dataset will stimulate diverse aspects of computer vision research and help us to advance the CVER tasks in the years ahead.