Introduction to a large-scale general purpose ground truth database: methodology, annotation tool and benchmarks

Authors:
Benjamin Yao;Xiong Yang;Song-Chun Zhu
Affiliations:
Lotus Hill Institute of Computer Vision and Information Sciences, EZhou City, HuBei Province, P.R. China;Lotus Hill Institute of Computer Vision and Information Sciences, EZhou City, HuBei Province, P.R. China;Lotus Hill Institute of Computer Vision and Information Sciences, EZhou City, HuBei Province, P.R. China
Venue:
EMMCVPR'07 Proceedings of the 6th international conference on Energy minimization methods in computer vision and pattern recognition
Year:
2007

Citing 7
Cited 36

Fast Approximate Energy Minimization via Graph Cuts

IEEE Transactions on Pattern Analysis and Machine Intelligence
Shape Matching and Object Recognition Using Shape Contexts

IEEE Transactions on Pattern Analysis and Machine Intelligence
Active Appearance Models

ECCV '98 Proceedings of the 5th European Conference on Computer Vision-Volume II - Volume II
Image Parsing: Unifying Segmentation, Detection, and Recognition

International Journal of Computer Vision
One-Shot Learning of Object Categories

IEEE Transactions on Pattern Analysis and Machine Intelligence
Composite Templates for Cloth Modeling and Sketching

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1
Primal sketch: Integrating structure and texture

Computer Vision and Image Understanding

A stochastic grammar of images

Foundations and Trends® in Computer Graphics and Vision
Perceptual Scale-Space and Its Applications

International Journal of Computer Vision
Photo-based question answering

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Semantic object classes in video: A high-definition ground truth database

Pattern Recognition Letters
Semantic event representation and recognition using syntactic attribute graph grammar

Pattern Recognition Letters
Towards Scalable Dataset Construction: An Active Learning Approach

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I
A stochastic graph grammar for compositional object representation and recognition

Pattern Recognition
Foreground classification using active template in the scene context for visual surveillance

Proceedings of the 2009 ACM symposium on Applied Computing
Effective semantic classification of consumer events for automatic content management

WSM '09 Proceedings of the first SIGMM workshop on Social media
From image parsing to painterly rendering

ACM Transactions on Graphics (TOG)
Gathering and ranking photos of named entities with high precision, high recall, and diversity

Proceedings of the third ACM international conference on Web search and data mining
Automatic online labeling images via co-active-learning

Proceedings of the First International Conference on Internet Multimedia Computing and Service
OPTIMOL: Automatic Online Picture Collection via Incremental Model Learning

International Journal of Computer Vision
The Pascal Visual Object Classes (VOC) Challenge

International Journal of Computer Vision
A Hierarchical and Contextual Model for Aerial Image Parsing

International Journal of Computer Vision
The segmented and annotated IAPR TC-12 benchmark

Computer Vision and Image Understanding
Sisley the abstract painter

NPAR '10 Proceedings of the 8th International Symposium on Non-Photorealistic Animation and Rendering
Hierarchical 3D perception from a single image

ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
CO3 for ultra-fast and accurate interactive segmentation

Proceedings of the international conference on Multimedia
Two-stage localization for image labeling

PCM'10 Proceedings of the 11th Pacific Rim conference on Advances in multimedia information processing: Part I
Enhancing interactive image segmentation with automatic label set augmentation

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part VI
Automatic image semantic interpretation using social action and tagging data

Multimedia Tools and Applications
Inference scene labeling by incorporating object detection with explicit shape model

ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part III
A Numerical Study of the Bottom-Up and Top-Down Inference Processes in And-Or Graphs

International Journal of Computer Vision
Improving the usability of hierarchical representations for interactively labeling large image data sets

HCII'11 Proceedings of the 14th international conference on Human-computer interaction: design and development approaches - Volume Part I
IAIR-CarPed: A psychophysically annotated dataset with fine-grained and layered semantic labels for object recognition

Pattern Recognition Letters
On the effects of normalization in adaptive MRF hierarchies

CompIMAGE'10 Proceedings of the Second international conference on Computational Modeling of Objects Represented in Images
Background modeling by subspace learning on spatio-temporal patches

Pattern Recognition Letters
First International Workshop on Visual Interfaces for Ground Truth Collection in Computer Vision Applications

Proceedings of the International Working Conference on Advanced Visual Interfaces
Efficient annotation of image data sets for computer vision applications

Proceedings of the 1st International Workshop on Visual Interfaces for Ground Truth Collection in Computer Vision Applications
Occlusion cues for image scene layering

Computer Vision and Image Understanding
Abstract painting with interactive control of perceptual entropy

ACM Transactions on Applied Perception (TAP)
Combining crowdsourcing and google street view to identify street-level accessibility problems

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Efficient development of user-defined image recognition systems

ACCV'12 Proceedings of the 11th international conference on Computer Vision - Volume Part I
Evidential grammars for image interpretation: application to multimodal traffic scene understanding

IUKM'13 Proceedings of the 2013 international conference on Integrated Uncertainty in Knowledge Modelling and Decision Making
Learning semantic representations of objects and their parts

Machine Learning

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper presents a large scale general purpose image database with human annotated ground truth. Firstly, an all-in-all labeling framework is proposed to group visual knowledge of three levels: scene level (global geometric description), object level (segmentation, sketch representation, hierarchical decomposition), and low-mid level (2.1D layered representation, object boundary attributes, curve completion, etc.). Much of this data has not appeared in previous databases. In addition, And-Or Graph is used to organize visual elements to facilitate top-down labeling. An annotation tool is developed to realize and integrate all tasks. With this tool, we've been able to create a database consisting of more than 636,748 annotated images and video frames. Lastly, the data is organized into 13 common subsets to serve as benchmarks for diverse evaluation endeavors.