Towards a framework for learning structured shape models from text-annotated images

Authors:
Sven Wachsmuth;Suzanne Stevenson;Sven Dickinson
Affiliations:
University of Toronto, Toronto, ON, Canada;University of Toronto, Toronto, ON, Canada;University of Toronto, Toronto, ON, Canada
Venue:
HLT-NAACL-LWM '04 Proceedings of the HLT-NAACL 2003 workshop on Learning word meaning from non-linguistic data - Volume 6
Year:
2003

Citing 9
Cited 2

3-D Shape Recovery Using Distributed Aspect Matching

IEEE Transactions on Pattern Analysis and Machine Intelligence - Special issue on interpretation of 3-D scenes—part II
Some advances in transformation-based part of speech tagging

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Shock Graphs and Shape Matching

International Journal of Computer Vision
Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
On the Representation and Matching of Qualitative Shape at Multiple Scales

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part III
Viewpoint-Invariant Indexing for Content-Based Image Retrieval

CAIVD '98 Proceedings of the 1998 International Workshop on Content-Based Access of Image and Video Databases (CAIVD '98)
Combining Textual and Visual Cues for Content-Based Image Retrieval on the World Wide Web

CBAIVL '98 Proceedings of the IEEE Workshop on Content - Based Access of Image and Video Libraries
Robust analysis of feature spaces: color image segmentation

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II

Generic Model Abstraction from Examples

IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning Visual Compound Models from Parallel Image-Text Datasets

Proceedings of the 30th DAGM symposium on Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present on-going work on the topic of learning translation models between image data and text (English) captions. Most approaches to this problem assume a one-to-one or a flat, one-to-many mapping between a segmented image region and a word. However, this assumption is very restrictive from the computer vision standpoint, and fails to account for two important properties of image segmentation: 1) objects often consist of multiple parts, each captured by an individual region; and 2) individual regions are often over-segmented into multiple subregions. Moreover, this assumption also fails to capture the structural relations among words, e.g., part/whole relations. We outline a general framework that accommodates a many-to-many mapping between image regions and words, allowing for structured descriptions on both sides. In this paper, we describe our extensions to the probabilistic translation model of Brown et al. (1993) (as in Duygulu et al. (2002)) that enable the creation of structured models of image objects. We demonstrate our work in progress, in which a set of annotated images is used to derive a set of labeled, structured descriptions in the presence of oversegmentation.