Association thesaurus construction methods based on link co-occurrence analysis for wikipedia

Authors:
Masahiro Ito;Kotaro Nakayama;Takahiro Hara;Shojiro Nishio
Affiliations:
Osaka University, Osaka, Japan;The University of Tokyo, Tokyo, Japan;Osaka University, Osaka, Japan;Osaka University, Osaka, Japan
Venue:
Proceedings of the 17th ACM conference on Information and knowledge management
Year:
2008

Citing 19
Cited 9

A cluster-based approach to thesaurus construction

SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic thesaurus generation for an electronic community system

Journal of the American Society for Information Science
WordNet: a lexical database for English

Communications of the ACM
A cooccurrence-based thesaurus and two applications to information retrieval

Information Processing and Management: an International Journal
Enhanced hypertext categorization using hyperlinks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Topical locality in the Web

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing

Communications of the ACM
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Automatic thesaurus generation for Chinese documents

Journal of the American Society for Information Science and Technology
Building a web thesaurus from web link structure

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Accurate unlexicalized parsing

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Mining Domain-Specific Thesauri from Wikipedia: A Case Study

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
A Thesaurus Construction Method from Large ScaleWeb Dictionaries

AINA '07 Proceedings of the 21st International Conference on Advanced Networking and Applications
WikiRelate! computing semantic relatedness using wikipedia

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Wikipedia mining for an association web thesaurus construction

WISE'07 Proceedings of the 8th international conference on Web information systems engineering
Automatic assignment of wikipedia encyclopedic entries to wordnet synsets

AWIC'05 Proceedings of the Third international conference on Advances in Web Intelligence

A graph-based approach to mining multilingual word associations from wikipedia

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A retrieval method for earth science data based on integrated use of wikipedia and domain ontology

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
Analysis of implicit relations on wikipedia: measuring strength through mining elucidatory objects

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Acquiring thesauri from wikis by exploiting domain models and lexical substitution

ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part II
BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network

Artificial Intelligence
Collaboratively built semi-structured content and Artificial Intelligence: The story so far

Artificial Intelligence
An open-source toolkit for mining Wikipedia

Artificial Intelligence
Mining a multilingual association dictionary from Wikipedia for cross-language information retrieval

Journal of the American Society for Information Science and Technology
Computing semantic relatedness using word frequency and layout information of Wikipedia

Proceedings of the 28th Annual ACM Symposium on Applied Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Wikipedia, a huge scale Web based encyclopedia, attracts great attention as an invaluable corpus for knowledge extraction because it has various impressive characteristics such as a huge number of articles, live updates, a dense link structure, brief anchor texts and URL identification for concepts. We have already proved that we can use Wikipedia to construct a huge scale accurate association thesaurus. The association thesaurus we constructed covers almost 1.3 million concepts and its accuracy is proved in detailed experiments. However, we still need scalable methods to analyze the huge number of Web pages and hyperlinks among articles in the Web based encyclopedia. In this paper, we propose a scalable method for constructing an association thesaurus from Wikipedia based on link co-occurrences. Link co-occurrence analysis is more scalable than link structure analysis because it is a one-pass process. We also propose integration method of tfidf and link co-occurrence analysis. Experimental results show that both our proposed methods are more accurate and scalable than conventional methods. Furthermore, the integration of tfidf achieved higher accuracy than using only link co-occurrences.