Appearance-Based Structure from Motion Using Linear Classes of 3-D Models

  • Authors:
  • Sing Bing Kang;Michael Jones

  • Affiliations:
  • Microsoft Research, Microsoft Corporation, One Microsoft Way, Redmond, WA 98052, USA. sbkang@microsoft.com;MERL Cambridge Systems, 201 Broadway, Cambridge, MA 02139, USA. mjones@merl.com

  • Venue:
  • International Journal of Computer Vision
  • Year:
  • 2002

Quantified Score

Hi-index 0.01

Visualization

Abstract

In this paper, we address the problem of recovering 3-D models from sequences of partly calibrated images with unknown correspondence. To that end, we integrate tracking, structure from motion with geometric constraints (specifically in the form of linear class models) in a single framework. The key to making the proposed approach work is the use of appearance-based model matching and refinement which updates the estimated correspondences on each iteration of the algorithm. Another key feature is the matching of a 3-D model directly with the input images without the conventional 2-step approach of stereo data recovery and 3-D model fitting. Initialization of the linear class model to one of the input images (the reference image) is currently partly manual.This synthesis and refine approach, or appearance-based constrained structure from motion (AbCSfm), is especially useful in recovering shapes of objects whose general structure is known but which may have little discernable texture in significant parts of their surfaces. We applied the proposed approach to 3-D face modeling from multiple images to create new 3-D faces for DECface, a synthetic talking head developed at Cambridge Research Laboratory, Digital Equipment Corporation. The DECface model comprises a collection of 3-D triangular and rectangular facets, with nodes as vertices. In recovering the DECface model, we assume that the sequence of images is taken with a camera with unknown focal length and pose. The geometric constraints used are of the form of linear combination of prototypes of 3-D faces of real people. Results of this approach show its good convergence properties and its robustness against cluttered backgrounds.