A Numerical Study of the Bottom-Up and Top-Down Inference Processes in And-Or Graphs

Authors:
Tianfu Wu;Song-Chun Zhu
Affiliations:
Department of Statistics, University of California, Los Angeles, USA and Lotus Hill Research Institute (LHI), Ezhou, China;Department of Statistics, University of California, Los Angeles, USA and Department of Computer Science, University of California, Los Angeles, USA and Lotus Hill Research Institute (LHI), Ezhou, ...
Venue:
International Journal of Computer Vision
Year:
2011

Citing 31
Cited 5

Generalized best-first search strategies and the optimality of A*

Journal of the ACM (JACM)
Image Segmentation by Data-Driven Markov Chain Monte Carlo

IEEE Transactions on Pattern Analysis and Machine Intelligence
Contextual Priming for Object Detection

International Journal of Computer Vision
Object Detection Using the Statistics of Parts

International Journal of Computer Vision
Robust Real-Time Face Detection

International Journal of Computer Vision
Pictorial Structures for Object Recognition

International Journal of Computer Vision
Image Parsing: Unifying Segmentation, Detection, and Recognition

International Journal of Computer Vision
One-Shot Learning of Object Categories

IEEE Transactions on Pattern Analysis and Machine Intelligence
Object Recognition as Many-to-Many Feature Matching

International Journal of Computer Vision
Context and Hierarchy in a Probabilistic Image Model

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition

International Journal of Computer Vision
Primal sketch: Integrating structure and texture

Computer Vision and Image Understanding
Robust Object Recognition with Cortex-Like Mechanisms

IEEE Transactions on Pattern Analysis and Machine Intelligence
Sharing Visual Features for Multiclass and Multiview Object Detection

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Component-based Framework for Face Detection and Identification

International Journal of Computer Vision
POP: Patchwork of Parts Models for Object Recognition

International Journal of Computer Vision
A stochastic grammar of images

Foundations and Trends® in Computer Graphics and Vision
Describing Visual Scenes Using Transformed Objects and Parts

International Journal of Computer Vision
Region-Based Hierarchical Image Matching

International Journal of Computer Vision
Putting Objects in Perspective

International Journal of Computer Vision
Bottom-Up/Top-Down Image Parsing with Attribute Grammar

IEEE Transactions on Pattern Analysis and Machine Intelligence
Combined Top-Down/Bottom-Up Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Unsupervised Category Modeling, Recognition, and Segmentation in Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning to Combine Bottom-Up and Top-Down Segmentation

International Journal of Computer Vision
The Representation and Matching of Images Using Top Points

Journal of Mathematical Imaging and Vision
The generalized A* architecture

Journal of Artificial Intelligence Research
Efficient Subwindow Search: A Branch and Bound Framework for Object Localization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Introduction to a large-scale general purpose ground truth database: methodology, annotation tool and benchmarks

EMMCVPR'07 Proceedings of the 6th international conference on Energy minimization methods in computer vision and pattern recognition
Object Detection with Discriminatively Trained Part-Based Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning Active Basis Model for Object Detection and Recognition

International Journal of Computer Vision
SpatialBoost: adding spatial reasoning to adaboost

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part IV

Context, Computation, and Optimal ROC Performance in Hierarchical Models

International Journal of Computer Vision
Cost-Sensitive top-down/bottom-up inference for multiscale activity recognition

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part IV
Object recognition using sparse representation of overcomplete dictionary

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part IV
Visual Saliency with Statistical Priors

International Journal of Computer Vision
Semantizing complex 3D scenes using constrained attribute grammars

SGP '13 Proceedings of the Eleventh Eurographics/ACMSIGGRAPH Symposium on Geometry Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a numerical study of the bottom-up and top-down inference processes in hierarchical models using the And-Or graph as an example. Three inference processes are identified for each node A in a recursively defined And-Or graph in which stochastic context sensitive image grammar is embedded: the 驴(A) process detects node A directly based on image features, the β(A) process computes node A by binding its child node(s) bottom-up and the 驴(A) process predicts node A top-down from its parent node(s). All the three processes contribute to computing node A from images in complementary ways. The objective of our numerical study is to explore how much information each process contributes and how these processes should be integrated to improve performance. We study them in the task of object parsing using And-Or graph formulated under the Bayesian framework. Firstly, we isolate and train the 驴(A), β(A) and 驴(A) processes separately by blocking the other two processes. Then, information contributions of each process are evaluated individually based on their discriminative power, compared with their respective human performance. Secondly, we integrate the three processes explicitly for robust inference to improve performance and propose a greedy pursuit algorithm for object parsing. In experiments, we choose two hierarchical case studies: one is junctions and rectangles in low-to-middle-level vision and the other is human faces in high-level vision. We observe that (i) the effectiveness of the 驴(A), β(A) and 驴(A) processes depends on the scale and occlusion conditions, (ii) the 驴(face) process is stronger than the 驴 processes of facial components, while β(junctions) and β(rectangle) work much better than their 驴 processes, and (iii) the integration of the three processes improves performance in ROC comparisons.