Graphical models for visual object recognition and tracking

Authors:
William T. Freeman;Alan S. Willsky;Erik B. Sudderth
Affiliations:
Massachusetts Institute of Technology;Massachusetts Institute of Technology;Massachusetts Institute of Technology
Venue:
Graphical models for visual object recognition and tracking
Year:
2006

Citing 0
Cited 9

Describing Visual Scenes Using Transformed Objects and Parts

International Journal of Computer Vision
Letters: Infinite-dimensional sampling inference algorithm for linear dynamic system

Neurocomputing
Regional category parsing in undirected graphical models

Pattern Recognition Letters
Hierarchical Bayesian language models for conversational speech recognition

IEEE Transactions on Audio, Speech, and Language Processing
Uncovering transcriptional regulatory networks by sparse Bayesian factor model

EURASIP Journal on Advances in Signal Processing - Special issue on genomic signal processing
Continuous surface-point distributions for 3D object pose estimation and recognition

ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part III
Probabilistic object models for pose estimation in 2D images

DAGM'11 Proceedings of the 33rd international conference on Pattern recognition
Bayesian nonparametric inverse reinforcement learning

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Research on Mining Common Concern via Infinite Topic Modelling

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03

Quantified Score

Hi-index	0.00

Visualization

Abstract

We develop statistical methods which allow effective visual detection, categorization, and tracking of objects in complex scenes. Such computer vision systems must be robust to wide variations in object appearance; the often small size of training databases, and ambiguities induced by articulated or partially occluded objects. Graphical models provide a powerful framework for encoding the statistical structure of visual scenes, and developing corresponding learning and inference algorithms. In this thesis, we describe several models which integrate graphical representations with nonparametric statistical methods. This approach leads to inference algorithms which tractably recover high-dimensional, continuous object pose variations, and learning procedures which transfer knowledge among related recognition tasks. Motivated by visual tracking problems, we first develop a nonparametric extension of the belief propagation (BP) algorithm. Using Monte Carlo methods, we provide general procedures for recursively updating particle-based approximations of continuous sufficient statistics. Efficient multiscale sampling methods then allow this nonparametric BP algorithm to be flexibly adapted to many different applications. As a particular example, we consider a graphical model describing the hand's three-dimensional (3D) structure, kinematics, and dynamics. This graph encodes global hand pose via the 3D position and orientation of several rigid components, and thus exposes local structure in a high-dimensional articulated model. Applying nonparametric BP, we recover a hand tracking algorithm which is robust to outliers and local visual ambiguities. Via a set of latent occupancy masks, we also extend our approach to consistently infer occlusion events in a distributed fashion. In the second half of this thesis, we develop methods for learning hierarchical models of objects, the parts composing them, and the scenes surrounding them. Our approach couples topic models originally developed for text analysis with spatial transformations, and thus consistently accounts for geometric constraints. By building integrated scene models, we may discover contextual relationships, and better exploit partially labeled training images. We first consider images of isolated objects, and show that sharing parts among object categories improves accuracy when learning from few examples. Turning to multiple object scenes, we propose nonparametric models which use Dirichlet processes to automatically learn the number of parts underlying each object category, and objects composing each scene. Adapting these transformed Dirichlet processes to images taken with a binocular stereo camera, we learn integrated, 3D models of object geometry and appearance. This leads to a Monte Carlo algorithm which automatically infers 3D scene structure from the predictable geometry of known object categories. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)