Foundations and Trends in Information Retrieval
Video Analytics in Urban Environments
AVSS '09 Proceedings of the 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance
Why meaningful automatic tagging of images is very hard
ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
A two-stage scheme for text detection in video images
Image and Vision Computing
Understanding transit scenes: a survey on human behavior-recognition algorithms
IEEE Transactions on Intelligent Transportation Systems
Video Surveillance Online Repository (ViSOR): an integrated framework
Multimedia Tools and Applications
Logic-based trajectory evaluation in videos
KI'10 Proceedings of the 33rd annual German conference on Advances in artificial intelligence
Globally optimal multi-target tracking on a hexagonal lattice
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part I
Performance metrics for activity recognition
ACM Transactions on Intelligent Systems and Technology (TIST)
Tracking clathrin coated pits with a multiple hypothesis based method
MICCAI'10 Proceedings of the 13th international conference on Medical image computing and computer-assisted intervention: Part II
Maneuvering head motion tracking by coarse-to-fine particle filter
ICIAR'11 Proceedings of the 8th international conference on Image analysis and recognition - Volume Part I
Multimedia Tools and Applications
Expert Systems with Applications: An International Journal
ISVC'11 Proceedings of the 7th international conference on Advances in visual computing - Volume Part II
Online selection of the best k-feature subset for object tracking
Journal of Visual Communication and Image Representation
Face detection using particle swarm optimization and support vector machines
SETN'10 Proceedings of the 6th Hellenic conference on Artificial Intelligence: theories, models and applications
A large margin framework for single camera offline tracking with hybrid cues
Computer Vision and Image Understanding
Adaptive transformation for robust privacy protection in video surveillance
Advances in Multimedia
A cascade face recognition system using hybrid feature extraction
Digital Signal Processing
Radar-based road-traffic monitoring in urban environments
Digital Signal Processing
Multiple human tracking in high-density crowds
Image and Vision Computing
Monocular object detection using 3d geometric primitives
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part I
GMCP-Tracker: global multi-object tracking using generalized minimum clique graphs
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
(MP)2T: multiple people multiple parts tracker
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VI
Exploiting pedestrian interaction via global optimization and social behaviors
Proceedings of the 15th international conference on Theoretical Foundations of Computer Vision: outdoor and large-scale real-world scene analysis
Symmetry-driven accumulation of local features for human characterization and re-identification
Computer Vision and Image Understanding
Online parameter tuning for object tracking algorithms
Image and Vision Computing
Hi-index | 0.15 |
Common benchmark data sets, standardized performance metrics, and baseline algorithms have demonstrated considerable impact on research and development in a variety of application domains. These resources provide both consumers and developers of technology with a common framework to objectively compare the performance of different algorithms and algorithmic improvements. In this paper, we present such a framework for evaluating object detection and tracking in video: specifically for face, text, and vehicle objects. This framework includes the source video data, ground-truth annotations (along with guidelines for annotation), performance metrics, evaluation protocols, and tools including scoring software and baseline algorithms. For each detection and tracking task and supported domain, we developed a 50-clip training set and a 50-clip test set. Each data clip is approximately 2.5 minutes long and has been completely spatially/temporally annotated at the I-frame level. Each task/domain, therefore, has an associated annotated corpus of approximately 450,000 frames. The scope of such annotation is unprecedented and was designed to begin to support the necessary quantities of data for robust machine learning approaches, as well as a statistically significant comparison of the performance of algorithms. The goal of this work was to systematically address the challenges of object detection and tracking through a common evaluation framework that permits a meaningful objective comparison of techniques, provides the research community with sufficient data for the exploration of automatic modeling techniques, encourages the incorporation of objective evaluation into the development process, and contributes useful lasting resources of a scale and magnitude that will prove to be extremely useful to the computer vision research community for years to come.