Learning Visual Compound Models from Parallel Image-Text Datasets

Authors:
Jan Moringen;Sven Wachsmuth;Sven Dickinson;Suzanne Stevenson
Affiliations:
Bielefeld University,;Bielefeld University,;University of Toronto,;University of Toronto,
Venue:
Proceedings of the 30th DAGM symposium on Pattern Recognition
Year:
2008

Citing 11
Cited 0

Hierarchical Chamfer Matching: A Parametric Edge Matching Algorithm

IEEE Transactions on Pattern Analysis and Machine Intelligence
Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Synergism in Low Level Vision

ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 4 - Volume 4
Matching words and pictures

The Journal of Machine Learning Research
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Towards a framework for learning structured shape models from text-annotated images

HLT-NAACL-LWM '04 Proceedings of the HLT-NAACL 2003 workshop on Learning word meaning from non-linguistic data - Volume 6
Using Language to Drive the Perceptual Grouping of Local Image Features

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Supervised Learning of Semantic Classes for Image Annotation and Retrieval

IEEE Transactions on Pattern Analysis and Machine Intelligence
Names and faces in the news

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Weakly supervised learning of part-based spatial models for visual object recognition

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I
A boundary-fragment-model for object detection

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a new approach to learn structured visual compound models from shape-based feature descriptions. We use captioned text in order to drive the process of grouping boundary fragments detected in an image. In the learning framework, we transfer several techniques from computational linguistics to the visual domain and build on previous work in image annotation. A statistical translation model is used in order to establish links between caption words and image elements. Then, compounds are iteratively built up by using a mutual information measure. Relations between compound elements are automatically extracted and increase the discriminability of the visual models. We show results on different synthetic and realistic datasets in order to validate our approach.