A Numerical Study of the Bottom-Up and Top-Down Inference Processes in And-Or Graphs

  • Authors:
  • Tianfu Wu;Song-Chun Zhu

  • Affiliations:
  • Department of Statistics, University of California, Los Angeles, USA and Lotus Hill Research Institute (LHI), Ezhou, China;Department of Statistics, University of California, Los Angeles, USA and Department of Computer Science, University of California, Los Angeles, USA and Lotus Hill Research Institute (LHI), Ezhou, ...

  • Venue:
  • International Journal of Computer Vision
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a numerical study of the bottom-up and top-down inference processes in hierarchical models using the And-Or graph as an example. Three inference processes are identified for each node A in a recursively defined And-Or graph in which stochastic context sensitive image grammar is embedded: the 驴(A) process detects node A directly based on image features, the β(A) process computes node A by binding its child node(s) bottom-up and the 驴(A) process predicts node A top-down from its parent node(s). All the three processes contribute to computing node A from images in complementary ways. The objective of our numerical study is to explore how much information each process contributes and how these processes should be integrated to improve performance. We study them in the task of object parsing using And-Or graph formulated under the Bayesian framework. Firstly, we isolate and train the 驴(A), β(A) and 驴(A) processes separately by blocking the other two processes. Then, information contributions of each process are evaluated individually based on their discriminative power, compared with their respective human performance. Secondly, we integrate the three processes explicitly for robust inference to improve performance and propose a greedy pursuit algorithm for object parsing. In experiments, we choose two hierarchical case studies: one is junctions and rectangles in low-to-middle-level vision and the other is human faces in high-level vision. We observe that (i) the effectiveness of the 驴(A), β(A) and 驴(A) processes depends on the scale and occlusion conditions, (ii) the 驴(face) process is stronger than the 驴 processes of facial components, while β(junctions) and β(rectangle) work much better than their 驴 processes, and (iii) the integration of the three processes improves performance in ROC comparisons.