A stochastic grammar of images

Authors:
Song-Chun Zhu;David Mumford
Affiliations:
University of California, Los Angeles;Brown University
Venue:
Foundations and Trends® in Computer Graphics and Vision
Year:
2006

Citing 41
Cited 45

Heuristics: intelligent search strategies for computer problem solving

Heuristics: intelligent search strategies for computer problem solving
Knowledge-based interpretation of outdoor natural color scenes

Knowledge-based interpretation of outdoor natural color scenes
A process-grammar for shape

Artificial Intelligence
From volumes to views: an approach to 3-D object recognition

CVGIP: Image Understanding - Special issue on directions in CAD-based vision
Parts of Visual Form: Computational Aspects

IEEE Transactions on Pattern Analysis and Machine Intelligence
Active shape models—their training and application

Computer Vision and Image Understanding
Visual learning and recognition of 3-D objects from appearance

International Journal of Computer Vision
Shapes, shocks, and deformations I: the components of two-dimensional shape and the reaction-diffusion space

International Journal of Computer Vision
FORMS: a flexible object recognition and modeling system

International Journal of Computer Vision
Embedding Gestalt Laws in Markov Random Fields

IEEE Transactions on Pattern Analysis and Machine Intelligence
Shock Graphs and Shape Matching

International Journal of Computer Vision
Image Segmentation by Data-Driven Markov Chain Monte Carlo

IEEE Transactions on Pattern Analysis and Machine Intelligence
Saliency, Scale and Image Description

International Journal of Computer Vision
Scale-Space Theory in Computer Vision

Scale-Space Theory in Computer Vision
Filtering, Segmentation, and Depth

Filtering, Segmentation, and Depth
Modeling Visual Patterns by Integrating Descriptive and Generative Methods

International Journal of Computer Vision
The Nonlinear Statistics of High-Contrast Patches in Natural Images

International Journal of Computer Vision - Special Issue on Computational Vision at Brown University
Texture Segmentation by Multiscale Aggregation of Filter Responses and Shape Elements

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Estimation of probabilistic context-free grammars

Computational Linguistics
Stochastic attribute-value grammars

Computational Linguistics
Recognition of Shapes by Editing Their Shock Graphs

IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories

CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 12 - Volume 12
Image Parsing: Unifying Segmentation, Detection, and Recognition

International Journal of Computer Vision
A High Resolution Grammatical Model for Face Representation and Sketching

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Generalizing Swendsen-Wang to Sampling Arbitrary Posterior Probabilities

IEEE Transactions on Pattern Analysis and Machine Intelligence
Perceptual Scale Space and its Applications

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Bottom-up/Top-Down Image Parsing by Attribute Graph Grammar

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
One-Shot Learning of Object Categories

IEEE Transactions on Pattern Analysis and Machine Intelligence
Parsing Images into Regions, Curves, and Curve Groups

International Journal of Computer Vision
Extracting Subimages of an Unknown Category from a Set of Images

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1
Composite Templates for Cloth Modeling and Sketching

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1
Context and Hierarchy in a Probabilistic Image Model

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Minimax Entropy Principle and Its Application to Texture Modeling

Neural Computation
Monte Carlo Strategies in Scientific Computing

Monte Carlo Strategies in Scientific Computing
Introduction to a large-scale general purpose ground truth database: methodology, annotation tool and benchmarks

EMMCVPR'07 Proceedings of the 6th international conference on Energy minimization methods in computer vision and pattern recognition
Bayesian inference for layer representation with mixed Markov random field

EMMCVPR'07 Proceedings of the 6th international conference on Energy minimization methods in computer vision and pattern recognition
Learning compositional categorization models

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part III
Wavelet-based statistical signal processing using hidden Markovmodels

IEEE Transactions on Signal Processing
Data compression and harmonic analysis

IEEE Transactions on Information Theory
Statistical modeling and conceptualization of visual patterns

IEEE Transactions on Pattern Analysis and Machine Intelligence
Hierarchical Stochastic Image Grammars for Classification and Segmentation

IEEE Transactions on Image Processing

Semantic event representation and recognition using syntactic attribute graph grammar

Pattern Recognition Letters
A stochastic graph grammar for compositional object representation and recognition

Pattern Recognition
2009 Special Issue: Cortext: A columnar model of bottom-up and top-down processing in the neocortex

Neural Networks
An Agent-Based Paradigm for Free-Hand Sketch Recognition

AI*IA '09: Proceedings of the XIth International Conference of the Italian Association for Artificial Intelligence Reggio Emilia on Emergent Perspectives in Artificial Intelligence
Object classification based on a geometric grammar with a range camera

ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
Wavelet, active basis, and shape script: a tour in the sparse land

Proceedings of the international conference on Multimedia information retrieval
A Hierarchical and Contextual Model for Aerial Image Parsing

International Journal of Computer Vision
Learning explicit and implicit visual manifolds by information projection

Pattern Recognition Letters
Object category recognition using generative template boosting

EMMCVPR'07 Proceedings of the 6th international conference on Energy minimization methods in computer vision and pattern recognition
Learning Active Basis Model for Object Detection and Recognition

International Journal of Computer Vision
Blocks world revisited: image understanding using qualitative geometry and mechanics

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Active mask hierarchies for object detection

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
A visual grammar for face detection

IBERAMIA'10 Proceedings of the 12th Ibero-American conference on Advances in artificial intelligence
A weak structure model for regular pattern recognition applied to facade images

ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part I
Max Margin Learning of Hierarchical Configural Deformable Templates (HCDTs) for Efficient Object Parsing and Pose Estimation

International Journal of Computer Vision
A Numerical Study of the Bottom-Up and Top-Down Inference Processes in And-Or Graphs

International Journal of Computer Vision
Inference and Learning with Hierarchical Shape Models

International Journal of Computer Vision
Predicate Logic Based Image Grammars for Complex Pattern Recognition

International Journal of Computer Vision
Context, Computation, and Optimal ROC Performance in Hierarchical Models

International Journal of Computer Vision
A probabilistic grouping principle to go from pixels to visual structures

DGCI'11 Proceedings of the 16th IAPR international conference on Discrete geometry for computer imagery
And-or graph grammar for architectural floor plan representation, learning and recognition: a semantic, structural and hierarchical model

IbPRIA'11 Proceedings of the 5th Iberian conference on Pattern recognition and image analysis
Towards a general vision system based on symbol-relation grammars and Bayesian networks

AGI'11 Proceedings of the 4th international conference on Artificial general intelligence
Recursive Compositional Models for Vision: Description and Review of Recent Work

Journal of Mathematical Imaging and Vision
Shape abstraction through multiple optimal solutions

ISVC'11 Proceedings of the 7th international conference on Advances in visual computing - Volume Part II
A new paradigm based on agents applied to free-hand sketch recognition

Expert Systems with Applications: An International Journal
Hierarchical model for joint detection and tracking of multi-target

ACCV'09 Proceedings of the 9th Asian conference on Computer Vision - Volume Part II
Dyna: extending datalog for modern AI

Datalog'10 Proceedings of the First international conference on Datalog Reloaded
Explaining Activities as Consistent Groups of Events

International Journal of Computer Vision
Learning a generative model of images by factoring appearance and shape

Neural Computation
Evaluating a color-based active basis model for object recognition

Computer Vision and Image Understanding
Cost-Sensitive top-down/bottom-up inference for multiscale activity recognition

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part IV
Unsupervised temporal commonality discovery

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part IV
Recovering and analyzing 3D models of branched structures using computer vision: a review

Proceedings of the 27th Conference on Image and Vision Computing New Zealand
A relational kernel-based framework for hierarchical image understanding

SSPR'12/SPR'12 Proceedings of the 2012 Joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition
Coupling-and-Decoupling: a hierarchical model for occlusion-free car detection

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part I
Hierarchical space tiling for scene modeling

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part II
A constraint propagation approach to structural model based image segmentation and recognition

Information Sciences: an International Journal
Learning latent spatio-temporal compositional model for human action recognition

Proceedings of the 21st ACM international conference on Multimedia
Object class detection: A survey

ACM Computing Surveys (CSUR)
Evidential grammars for image interpretation: application to multimodal traffic scene understanding

IUKM'13 Proceedings of the 2013 international conference on Integrated Uncertainty in Knowledge Modelling and Decision Making
Learning and parsing video events with goal and intent prediction

Computer Vision and Image Understanding
Biological models for active vision: towards a unified architecture

ICVS'13 Proceedings of the 9th international conference on Computer Vision Systems
Cognitive informatics in image semantics description, identification and automatic pattern understanding

Neurocomputing
Object recognition based on visual grammars and Bayesian networks

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Using grammars for pattern recognition in images: A systematic review

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This exploratory paper quests for a stochastic and context sensitive grammar of images. The grammar should achieve the following four objectives and thus serves as a unified framework of representation, learning, and recognition for a large number of object categories. (i) The grammar represents both the hierarchical decompositions from scenes, to objects, parts, primitives and pixels by terminal and nonterminal nodes and the contexts for spatial and functional relations by horizontal links between the nodes. It formulates each object category as the set of all possible valid configurations produced by the grammar. (ii) The grammar is embodied in a simple And-Or graph representation where each Or-node points to alternative sub-configurations and an And-node is decomposed into a number of components. This representation supports recursive top-down/bottom-up procedures for image parsing under the Bayesian framework and make it convenient to scale up in complexity. Given an input image, the image parsing task constructs a most probable parse graph on-the-fly as the output interpretation and this parse graph is a subgraph of the And-Or graph after making choice on the Or-nodes. (iii) A probabilistic model is defined on this And-Or graph representation to account for the natural occurrence frequency of objects and parts as well as their relations. This model is learned from a relatively small training set per category and then sampled to synthesize a large number of configurations to cover novel object instances in the test set. This generalization capability is mostly missing in discriminative machine learning methods and can largely improve recognition performance in experiments. (iv) To fill the well-known semantic gap between symbols and raw signals, the grammar includes a series of visual dictionaries and organizes them through graph composition. At the bottom-level the dictionary is a set of image primitives each having a number of anchor points with open bonds to link with other primitives. These primitives can be combined to form larger and larger graph structures for parts and objects. The ambiguities in inferring local primitives shall be resolved through top-down computation using larger structures. Finally these primitives forms a primal sketch representation which will generate the input image with every pixels explained. The proposal grammar integrates three prominent representations in the literature: stochastic grammars for composition, Markov (or graphical) models for contexts, and sparse coding with primitives (wavelets). It also combines the structure-based and appearance based methods in the vision literature. Finally the paper presents three case studies to illustrate the proposed grammar.