Editors Choice Article: Visual SLAM: Why filter?

  • Authors:
  • Hauke Strasdat;J. M. M. Montiel;Andrew J. Davison

  • Affiliations:
  • Department of Computing, Imperial College London, UK;Instituto de Investigacion en Ingeniera de Aragon (I3A), Universidad de Zaragoza, Spain;Department of Computing, Imperial College London, UK

  • Venue:
  • Image and Vision Computing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

While the most accurate solution to off-line structure from motion (SFM) problems is undoubtedly to extract as much correspondence information as possible and perform batch optimisation, sequential methods suitable for live video streams must approximate this to fit within fixed computational bounds. Two quite different approaches to real-time SFM - also called visual SLAM (simultaneous localisation and mapping) - have proven successful, but they sparsify the problem in different ways. Filtering methods marginalise out past poses and summarise the information gained over time with a probability distribution. Keyframe methods retain the optimisation approach of global bundle adjustment, but computationally must select only a small number of past frames to process. In this paper we perform a rigorous analysis of the relative advantages of filtering and sparse bundle adjustment for sequential visual SLAM. In a series of Monte Carlo experiments we investigate the accuracy and cost of visual SLAM. We measure accuracy in terms of entropy reduction as well as root mean square error (RMSE), and analyse the efficiency of bundle adjustment versus filtering using combined cost/accuracy measures. In our analysis, we consider both SLAM using a stereo rig and monocular SLAM as well as various different scenes and motion patterns. For all these scenarios, we conclude that keyframe bundle adjustment outperforms filtering, since it gives the most accuracy per unit of computing time.