A Hierarchical and Contextual Model for Aerial Image Parsing

Authors:
Jake Porway;Qiongchen Wang;Song Chun Zhu
Affiliations:
Department of Statistics, University of California, Los Angeles, USA;Department of Statistics, University of California, Los Angeles, USA and Lotus Hill Institute for Computer Vision and Information Science, Ezhou, China;Department of Statistics, University of California, Los Angeles, USA and Lotus Hill Institute for Computer Vision and Information Science, Ezhou, China
Venue:
International Journal of Computer Vision
Year:
2010

Citing 24
Cited 2

Knowledge-based interpretation of outdoor natural color scenes

Knowledge-based interpretation of outdoor natural color scenes
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Filters, Random Fields and Maximum Entropy (FRAME): Towards a Unified Theory for Texture Modeling

International Journal of Computer Vision
Shock Graphs and Shape Matching

International Journal of Computer Vision
Image Segmentation by Data-Driven Markov Chain Monte Carlo

IEEE Transactions on Pattern Analysis and Machine Intelligence
Improved Rooftop Detection in Aerial Images with Machine Learning

Machine Learning
Estimation of probabilistic context-free grammars

Computational Linguistics
Pictorial Structures for Object Recognition

International Journal of Computer Vision
Estimators for stochastic "Unification-Based" grammars

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
A Bayesian Hierarchical Model for Learning Natural Scene Categories

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Generic Model Abstraction from Examples

IEEE Transactions on Pattern Analysis and Machine Intelligence
Generalizing Swendsen-Wang to Sampling Arbitrary Posterior Probabilities

IEEE Transactions on Pattern Analysis and Machine Intelligence
Discovering Objects and their Localization in Images

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Bottom-up/Top-Down Image Parsing by Attribute Graph Grammar

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Extracting Subimages of an Unknown Category from a Set of Images

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1
Composite Templates for Cloth Modeling and Sketching

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1
Context and Hierarchy in a Probabilistic Image Model

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
The Representation and Matching of Pictorial Structures

IEEE Transactions on Computers
A stochastic grammar of images

Foundations and Trends® in Computer Graphics and Vision
Unsupervised Structure Learning: Hierarchical Recursive Composition, Suspicious Coincidence and Competitive Exclusion

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part II
Graphical Models, Exponential Families, and Variational Inference

Foundations and Trends® in Machine Learning
Introduction to a large-scale general purpose ground truth database: methodology, annotation tool and benchmarks

EMMCVPR'07 Proceedings of the 6th international conference on Energy minimization methods in computer vision and pattern recognition
Probabilistic spatial context models for scene content understanding

CVPR'03 Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition
TextonBoost: joint appearance, shape and context modeling for multi-class object recognition and segmentation

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I

Special Issue on Probabilistic Models for Image Understanding, Part II

International Journal of Computer Vision
Conditional random fields for land use/land cover classification and complex region detection

SSPR'12/SPR'12 Proceedings of the 2012 Joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present a hierarchical and contextual model for aerial image understanding. Our model organizes objects (cars, roofs, roads, trees, parking lots) in aerial scenes into hierarchical groups whose appearances and configurations are determined by statistical constraints (e.g. relative position, relative scale, etc.). Our hierarchy is a non-recursive grammar for objects in aerial images comprised of layers of nodes that can each decompose into a number of different configurations. This allows us to generate and recognize a vast number of scenes with relatively few rules. We present a minimax entropy framework for learning the statistical constraints between objects and show that this learned context allows us to rule out unlikely scene configurations and hallucinate undetected objects during inference. A similar algorithm was proposed for texture synthesis (Zhu et al. in Int. J. Comput. Vis. 2:107---126, 1998) but didn't incorporate hierarchical information. We use a range of different bottom-up detectors (AdaBoost, TextonBoost, Compositional Boosting (Freund and Schapire in J. Comput. Syst. Sci. 55, 1997; Shotton et al. in Proceedings of the European Conference on Computer Vision, pp. 1---15, 2006; Wu et al. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1---8, 2007)) to propose locations of objects in new aerial images and employ a cluster sampling algorithm (C4 (Porway and Zhu, 2009)) to choose the subset of detections that best explains the image according to our learned prior model. The C4 algorithm can quickly and efficiently switch between alternate competing sub-solutions, for example whether an image patch is better explained by a parking lot with cars or by a building with vents. We also show that our model can predict the locations of objects our detectors missed. We conclude by presenting parsed aerial images and experimental results showing that our cluster sampling and top-down prediction algorithms use the learned contextual cues from our model to improve detection results over traditional bottom-up detectors alone.