Semantic object classes in video: A high-definition ground truth database

Authors:
Gabriel J. Brostow;Julien Fauqueur;Roberto Cipolla
Affiliations:
Computer Vision Group, University of Cambridge, United Kingdom and Computer Vision and Geometry Group, ETH Zurich;Computer Vision Group, University of Cambridge, United Kingdom;Computer Vision Group, University of Cambridge, United Kingdom
Venue:
Pattern Recognition Letters
Year:
2009

Citing 19
Cited 9

Unsupervised Segmentation of Color-Texture Regions in Images and Video

IEEE Transactions on Pattern Analysis and Machine Intelligence
Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons

International Journal of Computer Vision
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

International Journal of Computer Vision
The Truth about Corel - Evaluation in Image Retrieval

CIVR '02 Proceedings of the International Conference on Image and Video Retrieval
Robust analysis of feature spaces: color image segmentation

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Recognizing Action at a Distance

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Efficient Graph-Based Image Segmentation

International Journal of Computer Vision
Keyframe-based tracking for rotoscoping and animation

ACM SIGGRAPH 2004 Papers
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Interactive video cutout

ACM SIGGRAPH 2005 Papers
One-Shot Learning of Object Categories

IEEE Transactions on Pattern Analysis and Machine Intelligence
Photo tourism: exploring photo collections in 3D

ACM SIGGRAPH 2006 Papers
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Putting Objects in Perspective

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Evaluation campaigns and TRECVid

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Exploiting spatial context constraints for automatic image region annotation

Proceedings of the 15th international conference on Multimedia
Introduction to a large-scale general purpose ground truth database: methodology, annotation tool and benchmarks

EMMCVPR'07 Proceedings of the 6th international conference on Energy minimization methods in computer vision and pattern recognition
TextonBoost: joint appearance, shape and context modeling for multi-class object recognition and segmentation

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I
Human detection using oriented histograms of flow and appearance

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part II

Large margin cost-sensitive learning of conditional random fields

Pattern Recognition
Semantic segmentation of urban scenes using dense depth maps

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Supervised label transfer for semantic segmentation of street scenes

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
IAIR-CarPed: A psychophysically annotated dataset with fine-grained and layered semantic labels for object recognition

Pattern Recognition Letters
Semantic parsing of street scenes from video

International Journal of Robotics Research
Online semantic mapping of urban environments

SC'12 Proceedings of the 2012 international conference on Spatial Cognition VIII
Supervised geodesic propagation for semantic label transfer

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III
Road scene segmentation from a single image

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VII
Semantic road segmentation via multi-scale ensembles of learned features

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume 2

Quantified Score

Hi-index	0.10

Visualization

Abstract

Visual object analysis researchers are increasingly experimenting with video, because it is expected that motion cues should help with detection, recognition, and other analysis tasks. This paper presents the Cambridge-driving Labeled Video Database (CamVid) as the first collection of videos with object class semantic labels, complete with metadata. The database provides ground truth labels that associate each pixel with one of 32 semantic classes. The database addresses the need for experimental data to quantitatively evaluate emerging algorithms. While most videos are filmed with fixed-position CCTV-style cameras, our data was captured from the perspective of a driving automobile. The driving scenario increases the number and heterogeneity of the observed object classes. Over 10min of high quality 30Hz footage is being provided, with corresponding semantically labeled images at 1Hz and in part, 15Hz. The CamVid Database offers four contributions that are relevant to object analysis researchers. First, the per-pixel semantic segmentation of over 700 images was specified manually, and was then inspected and confirmed by a second person for accuracy. Second, the high-quality and large resolution color video images in the database represent valuable extended duration digitized footage to those interested in driving scenarios or ego-motion. Third, we filmed calibration sequences for the camera color response and intrinsics, and computed a 3D camera pose for each frame in the sequences. Finally, in support of expanding this or other databases, we present custom-made labeling software for assisting users who wish to paint precise class-labels for other images and videos. We evaluate the relevance of the database by measuring the performance of an algorithm from each of three distinct domains: multi-class object recognition, pedestrian detection, and label propagation.