A text-to-picture synthesis system for augmenting communication

Authors:
Xiaojin Zhu;Andrew B. Goldberg;Mohamed Eldawy;Charles R. Dyer;Bradley Strock
Affiliations:
Department of Computer Sciences, University of Wisconsin, Madison, WI;Department of Computer Sciences, University of Wisconsin, Madison, WI;Department of Computer Sciences, University of Wisconsin, Madison, WI;Department of Computer Sciences, University of Wisconsin, Madison, WI;Department of Computer Sciences, University of Wisconsin, Madison, WI
Venue:
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Year:
2007

Citing 15
Cited 17

WordsEye: an automatic text-to-scene conversion system

Proceedings of the 28th annual conference on Computer graphics and interactive techniques
Mean Shift: A Robust Approach Toward Feature Space Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Multimedia Learning

Multimedia Learning
Natural language driven image generation

ACL '84 Proceedings of the 10th International Conference on Computational Linguistics and 22nd annual meeting on Association for Computational Linguistics
Efficient Graph-Based Image Segmentation

International Journal of Computer Vision
Design considerations for Picture Production in a Natural Language graphics system

ACM SIGGRAPH Computer Graphics
Reconstructing spatial image from natural language texts

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 4
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Automatic evaluation of summaries using N-gram co-occurrence statistics

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Picture Collage

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1
ImprovingWeb-based Image Search via Content Based Clustering

CVPRW '06 Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop
Put: Language-Based Interactive Manipulation of Objects

IEEE Computer Graphics and Applications
WordNet: similarity - measuring the relatedness of concepts

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Automatic text-to-scene conversion in the traffic accident domain

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Automatic generation of computeranimation: using AI for movie animation

Automatic generation of computeranimation: using AI for movie animation

Word2Image: towards visual interpreting of words

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Toward communicating simple sentences using pictorial representations

Machine Translation
Easy as ABC?: facilitating pictorial communication via semantically enhanced layout

CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
Web image interpretation: semi-supervised mining annotated words

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Translation by iterative collaboration between monolingual users

Proceedings of Graphics Interface 2010
Identifying annotations for adventure game generation from fiction text

SAICSIT '10 Proceedings of the 2010 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists
Multimodal summarization of complex sentences

Proceedings of the 16th international conference on Intelligent user interfaces
Multiple hypergraph clustering of web images by mining Word2Image correlations

Journal of Computer Science and Technology
Adding emotions to pictures

ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Collecting semantic information for locations in the scenario-based lexical knowledge resource of a text-to-scene conversion system

KES'11 Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part IV
Enriching textbooks with images

Proceedings of the 20th ACM international conference on Information and knowledge management
News contextualization with geographic and visual information

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Pag cloud: a method to show an image set

Proceedings of the 9th International Conference on Advances in Mobile Computing and Multimedia
picoTrans: using pictures as input for machine translation on mobile devices

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Kinect-based visual communication system

Proceedings of the 4th International Conference on Internet Multimedia Computing and Service
Enabling the discovery of digital cultural heritage objects through Wikipedia

LaTeCH '12 Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
picoTrans: An intelligent icon-driven interface for cross-lingual communication

ACM Transactions on Interactive Intelligent Systems (TiiS) - Special section on internet-scale human problem solving and regular papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a novel Text-to-Picture system that synthesizes a picture from general, unrestricted natural language text. The process is analogous to Text-to-Speech synthesis, but with pictorial output that conveys the gist of the text. Our system integrates multiple AI components, including natural language processing, computer vision, computer graphics, and machine learning. We present an integration framework that combines these components by first identifying infonnative and 'picturable' text units, then searching for the most likely image parts conditioned on the text, and finally optimizing the picture layout conditioned on both the text and image parts. The effectiveness of our system is assessed in two user studies using children's books and news articles. Experiments show that the synthesized pictures convey as much infonnation about children's stories as the original artists' illustrations, and much more information about news articles than their original photos alone. These results suggest that Text-to-Picture synthesis has great potential in augmenting human-computer and human-human communication modalities, with applications in education and health care, among others.