Web-based information content and its application to concept-based video retrieval

Authors:
Alexander Haubold;Apostol Natsev
Affiliations:
Columbia University, New York, NY, USA;IBM Thomas J. Watson, Hawthorne, NY, USA
Venue:
CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Year:
2008

Citing 12
Cited 4

Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone

SIGDOC '86 Proceedings of the 5th annual international conference on Systems documentation
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Using corpus statistics and WordNet relations for sense identification

Computational Linguistics - Special issue on word sense disambiguation
Verbs semantics and lexical selection

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
On the detection of semantic concepts at TRECVID

Proceedings of the 12th annual ACM international conference on Multimedia
Semantic similarity methods in wordNet and their application to information retrieval on the web

Proceedings of the 7th annual ACM international workshop on Web information and data management
Semantic concept-based query expansion and re-ranking for multimedia retrieval

Proceedings of the 15th international conference on Multimedia
WordNet::Similarity: measuring the relatedness of concepts

HLT-NAACL--Demonstrations '04 Demonstration Papers at HLT-NAACL 2004
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Using measures of semantic relatedness for word sense disambiguation

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Video retrieval using high level features: exploiting query matching and confidence-based weighting

CIVR'06 Proceedings of the 5th international conference on Image and Video Retrieval
Adding Semantics to Detectors for Video Retrieval

IEEE Transactions on Multimedia

Concept-Based Video Retrieval

Foundations and Trends in Information Retrieval
Semantic context transfer across heterogeneous sources for domain adaptive video search

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Web news categorization using a cross-media document graph

Proceedings of the ACM International Conference on Image and Video Retrieval
An integrated semantic-based approach in concept based video retrieval

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Semantic similarity between words or phrases is frequently used to find matching correlations between search queries and documents when straightforward matching of terms fails. This is particularly important for searching in visual databases, where pictures or video clips have been automatically tagged with a small set of semantic concepts based on analysis and classification of the visual content. Here, the textual description of documents is very limited, and semantic similarity based on WordNet's cognitive synonym structure, along with information content derived from term frequencies, can help to bridge the gap between an arbitrary textual query and a limited vocabulary of visual concepts. This approach, termed concept-based retrieval, has received significant attention over the last few years, and its success is highly dependent on the quality of the similarity measure used to map textual query terms to visual concepts. In this paper, we consider some issues of semantic similarity measures based on Information Content (IC), and propose a way to improve them. In particular, we note that most IC-based similarity measures are derived from a small and relatively outdated corpus (the Brown corpus), which does not adequately capture the usage pattern of many contemporary terms: for example, out of more than 150,000 WordNet terms, only about 36,000 are represented. This shortcoming reflects very negatively on the coverage of typical search query terms. We therefore suggest using alternative IC corpora that are larger and better aligned with the usage of modern vocabulary. We experimentally derive two such corpora using the WWW Google search engine, and show that they provide better coverage of vocabulary, while showing comparable frequencies for Brown corpus terms. Finally, we evaluate the two proposed IC corpora in the context of a concept-based video retrieval application using the TRECVID 2005, 2006, and 2007 datasets, and we show that they increase average precision results by up to 200%.