A Closed-Form Solution to Non-Rigid Shape and Motion Recovery

  • Authors:
  • Jing Xiao;Jinxiang Chai;Takeo Kanade

  • Affiliations:
  • Epson Palo Alto Laboratory, Palo Alto, USA 94304;Robotics Institute, CMU, Pittsburgh, USA 15213;Robotics Institute, CMU, Pittsburgh, USA 15213

  • Venue:
  • International Journal of Computer Vision
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recovery of three dimensional (3D) shape and motion of non-static scenes from a monocular video sequence is important for applications like robot navigation and human computer interaction. If every point in the scene randomly moves, it is impossible to recover the non-rigid shapes. In practice, many non-rigid objects, e.g. the human face under various expressions, deform with certain structures. Their shapes can be regarded as a weighted combination of certain shape bases. Shape and motion recovery under such situations has attracted much interest. Previous work on this problem (Bregler, C., Hertzmann, A., and Biermann, H. 2000. In Proc. Int. Conf. Computer Vision and Pattern Recognition; Brand, M. 2001. In Proc. Int. Conf. Computer Vision and Pattern Recognition; Torresani, L., Yang, D., Alexander, G., and Bregler, C. 2001. In Proc. Int. Conf. Computer Vision and Pattern Recognition) utilized only orthonormality constraints on the camera rotations (rotation constraints). This paper proves that using only the rotation constraints results in ambiguous and invalid solutions. The ambiguity arises from the fact that the shape bases are not unique. An arbitrary linear transformation of the bases produces another set of eligible bases. To eliminate the ambiguity, we propose a set of novel constraints, basis constraints, which uniquely determine the shape bases. We prove that, under the weak-perspective projection model, enforcing both the basis and the rotation constraints leads to a closed-form solution to the problem of non-rigid shape and motion recovery. The accuracy and robustness of our closed-form solution is evaluated quantitatively on synthetic data and qualitatively on real video sequences.