Reading between the tags to predict real-world size-class for visually depicted objects in images

Authors:
Martha Larson;Christoph Kofler;Alan Hanjalic
Affiliations:
Delft University of Technology, Delft, Netherlands;Delft University of Technology, Delft, Netherlands;Delft University of Technology, Delft, Netherlands
Venue:
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Year:
2011

Citing 16
Cited 4

Using the web to obtain frequencies for unseen bigrams

Computational Linguistics - Special issue on web as corpus
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
The Google Similarity Distance

IEEE Transactions on Knowledge and Data Engineering
Flickr tag recommendation based on collective knowledge

Proceedings of the 17th international conference on World Wide Web
The MIR flickr retrieval evaluation

MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Learning tag relevance by neighbor voting for social image retrieval

MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I
Tag ranking

Proceedings of the 18th international conference on World wide web
Concept-Based Video Retrieval

Foundations and Trends in Information Retrieval
Web mining for event-based commonsense knowledge using lexico-syntactic pattern matching and semantic role labeling

Expert Systems with Applications: An International Journal
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative

Proceedings of the international conference on Multimedia information retrieval
A methodology to learn ontological attributes from the Web

Data & Knowledge Engineering
Tagging tags

Proceedings of the international conference on Multimedia
Tag refinement in an image folksonomy using visual similarity and tag co-occurrence statistics

Image Communication
Automatic image semantic interpretation using social action and tagging data

Multimedia Tools and Applications

Alice's worlds of wonder: exploiting tags to understand images in terms of size and scale

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Estimating content concreteness for finding comprehensible documents

Proceedings of the sixth ACM international conference on Web search and data mining
Multimedia information seeking through search and hyperlinking

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Social-oriented visual image search

Computer Vision and Image Understanding

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multimedia information retrieval stands to benefit from the availability of additional information about tags and how they relate to the content visually depicted in images. We propose a generic approach that contributes to improving the informativeness of image tags by combining generalizations about the distributional tendencies of physical objects in the real world and statistics of natural language use patterns that have been mined from the Web. The approach, which we refer to as 'Reading between the Tags,' provides for each tag associated with an image, first, a prediction concerning corporeality, i.e., whether or not the tag denotes a physical entity, and, then, concerning the real-world size of that entity, i.e., large, medium or small. Mining takes place using a set of Language Use Frames (LUFs) that are composed of natural language neighborhoods characteristic of tag classes. We validate our approach with a series of experiments on a set of images from the MIRFLICKR data set using ground truth created with standard crowdsourcing techniques. The main experiments demonstrate the effectiveness of our approach for size-class prediction. A further experiment shows that size-class prediction can be improved and made image-specific using general and relatively small sets of visual concepts. A final experiment confirms that the set of LUFs can also be chosen automatically via statistical feature selection.