A cluster-based approach to thesaurus construction
SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic thesaurus generation for an electronic community system
Journal of the American Society for Information Science
WordNet: a lexical database for English
Communications of the ACM
A cooccurrence-based thesaurus and two applications to information retrieval
Information Processing and Management: an International Journal
Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing
Communications of the ACM
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Automatic thesaurus generation for Chinese documents
Journal of the American Society for Information Science and Technology
Building a web thesaurus from web link structure
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A simple rule-based part of speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
Accurate unlexicalized parsing
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Mining Domain-Specific Thesauri from Wikipedia: A Case Study
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
A Thesaurus Construction Method from Large ScaleWeb Dictionaries
AINA '07 Proceedings of the 21st International Conference on Advanced Networking and Applications
WikiRelate! computing semantic relatedness using wikipedia
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Computing semantic relatedness using Wikipedia-based explicit semantic analysis
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Wikipedia mining for an association web thesaurus construction
WISE'07 Proceedings of the 8th international conference on Web information systems engineering
Automatic assignment of wikipedia encyclopedic entries to wordnet synsets
AWIC'05 Proceedings of the Third international conference on Advances in Web Intelligence
A graph-based approach to mining multilingual word associations from wikipedia
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A retrieval method for earth science data based on integrated use of wikipedia and domain ontology
DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
Analysis of implicit relations on wikipedia: measuring strength through mining elucidatory objects
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Acquiring thesauri from wikis by exploiting domain models and lexical substitution
ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part II
Collaboratively built semi-structured content and Artificial Intelligence: The story so far
Artificial Intelligence
An open-source toolkit for mining Wikipedia
Artificial Intelligence
Mining a multilingual association dictionary from Wikipedia for cross-language information retrieval
Journal of the American Society for Information Science and Technology
Computing semantic relatedness using word frequency and layout information of Wikipedia
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Hi-index | 0.00 |
Wikipedia, a huge scale Web based encyclopedia, attracts great attention as an invaluable corpus for knowledge extraction because it has various impressive characteristics such as a huge number of articles, live updates, a dense link structure, brief anchor texts and URL identification for concepts. We have already proved that we can use Wikipedia to construct a huge scale accurate association thesaurus. The association thesaurus we constructed covers almost 1.3 million concepts and its accuracy is proved in detailed experiments. However, we still need scalable methods to analyze the huge number of Web pages and hyperlinks among articles in the Web based encyclopedia. In this paper, we propose a scalable method for constructing an association thesaurus from Wikipedia based on link co-occurrences. Link co-occurrence analysis is more scalable than link structure analysis because it is a one-pass process. We also propose integration method of tfidf and link co-occurrence analysis. Experimental results show that both our proposed methods are more accurate and scalable than conventional methods. Furthermore, the integration of tfidf achieved higher accuracy than using only link co-occurrences.