A Computational Approach to Edge Detection
IEEE Transactions on Pattern Analysis and Machine Intelligence
Placing search in context: the concept revisited
ACM Transactions on Information Systems (TOIS)
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope
International Journal of Computer Vision
ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Object Recognition from Local Scale-Invariant Features
ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
Video Google: A Text Retrieval Approach to Object Matching in Videos
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Labeling images with a computer game
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Scalable Recognition with a Vocabulary Tree
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Evaluating bag-of-visual-words representations in scene classification
Proceedings of the international workshop on Workshop on multimedia information retrieval
Visual information in semantic representation
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
From frequency to meaning: vector space models of semantics
Journal of Artificial Intelligence Research
Every picture tells a story: generating sentences from images
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Computer Vision: Algorithms and Applications
Computer Vision: Algorithms and Applications
Distributional memory: A general framework for corpus-based semantics
Computational Linguistics
Colourful language: measuring word-colour associations
CMCL '11 Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics
A comparison of unsupervised methods to associate colors with words
ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II
Distributional semantics from text and images
GEMS '11 Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics
Literal and metaphorical sense identification through concrete and abstract context
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Baby talk: Understanding and generating simple image descriptions
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Learning bilingual lexicons using the visual similarity of labeled web images
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
First-order vs. higher-order modification in distributional semantics
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Proceedings of the 20th ACM international conference on Multimedia
Hi-index | 0.00 |
Our research aims at building computational models of word meaning that are perceptually grounded. Using computer vision techniques, we build visual and multimodal distributional models and compare them to standard textual models. Our results show that, while visual models with state-of-the-art computer vision techniques perform worse than textual models in general tasks (accounting for semantic relatedness), they are as good or better models of the meaning of words with visual correlates such as color terms, even in a nontrivial task that involves nonliteral uses of such words. Moreover, we show that visual and textual information are tapping on different aspects of meaning, and indeed combining them in multimodal models often improves performance.