Are we ready for autonomous driving? The KITTI vision benchmark suite

Authors:
Andreas Geiger
Affiliations:
Karlsruhe Institute of Technology
Venue:
CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Year:
2012

Citing 0
Cited 22

Continuous markov random fields for robust stereo estimation

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
A naturalistic open source movie for optical flow evaluation

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VI
On performance analysis of optical flow algorithms

Proceedings of the 15th international conference on Theoretical Foundations of Computer Vision: outdoor and large-scale real-world scene analysis
Quality assessment of non-dense image correspondences

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume 2
Analysis of KITTI data for stereo analysis with stereo confidence measures

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume 2
Lessons and insights from creating a synthetic optical flow benchmark

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume 2
Fast point-of-interest detection from real-time stereo

Proceedings of the 27th Conference on Image and Vision Computing New Zealand
Co-training on multi-view unlabelled data

Proceedings of the 27th Conference on Image and Vision Computing New Zealand
Comparing ICP variants on real-world data sets

Autonomous Robots
Iterative semi-global matching for robust driver assistance systems

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part III
Hierarchical scan-line dynamic programming for optical flow using semi-global matching

ACCV'12 Proceedings of the 11th international conference on Computer Vision - Volume 2
Ground truth design principles: an overview

Proceedings of the International Workshop on Video and Image Ground Truth in Computer Vision Applications
Evidential grammars for image interpretation: application to multimodal traffic scene understanding

IUKM'13 Proceedings of the 2013 international conference on Integrated Uncertainty in Knowledge Modelling and Decision Making
Personal driving diary: Automated recognition of driving events from first-person videos

Computer Vision and Image Understanding
Vision meets robotics: The KITTI dataset

International Journal of Robotics Research
Is crowdsourcing for optical flow ground truth generation feasible?

ICVS'13 Proceedings of the 9th international conference on Computer Vision Systems
Fusion of 3D-LIDAR and camera data for scene parsing

Journal of Visual Communication and Image Representation
A robust cost function for stereo matching of road scenes

Pattern Recognition Letters
Fast and Accurate Stereo Vision System on FPGA

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
A Quantitative Analysis of Current Practices in Optical Flow Estimation and the Principles Behind Them

International Journal of Computer Vision
Real-time advanced spinal surgery via visible patient model and augmented reality system

Computer Methods and Programs in Biomedicine
Variational Recursive Joint Estimation of Dense Scene Structure and Camera Motion from Monocular High Speed Traffic Sequences

International Journal of Computer Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

Today, visual recognition systems are still rarely employed in robotics applications. Perhaps one of the main reasons for this is the lack of demanding benchmarks that mimic such scenarios. In this paper, we take advantage of our autonomous driving platform to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection. Our recording platform is equipped with four high resolution video cameras, a Velodyne laser scanner and a state-of-the-art localization system. Our benchmarks comprise 389 stereo and optical flow image pairs, stereo visual odometry sequences of 39.2 km length, and more than 200k 3D object annotations captured in cluttered scenarios (up to 15 cars and 30 pedestrians are visible per image). Results from state-of-the-art algorithms reveal that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world. Our goal is to reduce this bias by providing challenging benchmarks with novel difficulties to the computer vision community. Our benchmarks are available online at: www.cvlibs.net/datasets/kitti