Wikipedia-assisted concept thesaurus for better web media understanding

Authors:
Huan Wang;Liang-Tien Chia;Shenghua Gao
Affiliations:
Nanyang Technological University, Singapore, Singapore;Nanyang Technological University, Singapore, Singapore;Nanyang Technological University, Singapore, Singapore
Venue:
Proceedings of the international conference on Multimedia information retrieval
Year:
2010

Citing 9
Cited 1

Building Large Knowledge-Based Systems; Representation and Inference in the Cyc Project

Building Large Knowledge-Based Systems; Representation and Inference in the Cyc Project
Building Large Scale Ontology Networks

LEC '02 Proceedings of the Language Engineering Conference (LEC'02)
Scale & Affine Invariant Interest Point Detectors

International Journal of Computer Vision
SemRetriev: an ontology driven image retrieval system

Proceedings of the 6th ACM international conference on Image and video retrieval
Ontology-enriched semantic space for video search

Proceedings of the 15th international conference on Multimedia
Enhancing text clustering by leveraging Wikipedia semantics

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Improving Text Classification by Using Encyclopedia Knowledge

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Ontology enhanced web image retrieval: aided by wikipedia & spreading activation theory

MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Coloring local feature extraction

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part II

Multipedia: enriching DBpedia with multimedia information

Proceedings of the sixth international conference on Knowledge capture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Concept ontology has been used in the area of artificial intelligence, biomedical informatics and library science and it has been shown as an effective approach to better understand data in the respective domains. One main difficulty that hedge against the development of ontology approaches is the extra work required in ontology construction and annotation. With the emergent lexical dictionaries and encyclopedias such as WordNet, Wikipedia, innovations from different directions have been proposed to automatically extract concept ontologies. Unfortunately, many of the proposed ontologies are not fully exploited according to the general human knowledge. We study the various knowledge sources and aim to build a construct scalable concept thesaurus suitable for better understanding of media in the World Wide Web from Wikipedia. With its wide concept coverage, finely organized categories, diverse concept relations, and up-to-date information, the collaborative encyclopedia Wikipedia has almost all the requisite attributes to contribute to a well-defined concept ontology. Besides the explicit concept relations such as disambiguation, synonymy, Wikipedia also provides implicit concept relations through cross-references between articles. In our previous work, we have built ontology with explicit relations from Wikipedia page contents. Even though the method works, mining explicit semantic relations from every Wikipedia concept page content has unsolved scalable issue when more concepts are involved. This paper describes our attempt to automatically build a concept thesaurus, which encodes both explicit and implicit semantic relations for a large-scale of concepts from Wikipedia. Our proposed thesaurus construction takes advantage of both structure and content features of the downloaded Wikipedia database, and defines concept entries with its related concepts and relations. This thesaurus is further used to exploit semantics from web page context to build a more semantic meaningful space. We move a step forward to combine the similarity distance from the image feature space to boost the performance. We evaluate our approach through application of the constructed concept thesaurus to web image retrieval. The results show that it is possible to use implicit semantic relations to improve the retrieval performance.