Cross-caption coreference resolution for automatic image understanding

Authors:
Micah Hodosh;Peter Young;Cyrus Rashtchian;Julia Hockenmaier
Affiliations:
University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign
Venue:
CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Year:
2010

Citing 9
Cited 1

Modeling annotated data

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Matching words and pictures

The Journal of Machine Learning Research
Extracting paraphrases from a parallel corpus

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Building a visual ontology for video retrieval

Proceedings of the 13th annual ACM international conference on Multimedia
Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Wide-coverage efficient statistical parsing with ccg and log-linear models

Computational Linguistics
Latent Dirichlet Allocation with topic-in-set knowledge

SemiSupLearn '09 Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
Simple coreference resolution with rich syntactic and semantic features

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Collecting image annotations using Amazon's Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk

Learning bilingual lexicons using the visual similarity of labeled web images

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent work in computer vision has aimed to associate image regions with keywords describing the depicted entities, but actual image 'understanding' would also require identifying their attributes, relations and activities. Since this information cannot be conveyed by simple keywords, we have collected a corpus of "action" photos each associated with five descriptive captions. In order to obtain a consistent semantic representation for each image, we need to first identify which NPs refer to the same entities. We present three hierarchical Bayesian models for cross-caption coreference resolution. We have also created a simple ontology of entity classes that appear in images and evaluate how well these can be recovered.