Mining Domain-Specific Thesauri from Wikipedia: A Case Study

Authors:
David Milne;Olena Medelyan;Ian H. Witten
Affiliations:
University of Waikato, New Zealand;University of Waikato, New Zealand;University of Waikato, New Zealand
Venue:
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Year:
2006

Citing 5
Cited 31

Exploiting a Thesaurus-Based Semantic Net for Knowledge-Based Search

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Wikipedia risks

Communications of the ACM - The semantic e-business vision
Improvements in automatic thesaurus extraction

ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
WikiRelate! computing semantic relatedness using wikipedia

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Automatic assignment of wikipedia encyclopedic entries to wordnet synsets

AWIC'05 Proceedings of the Third international conference on Advances in Web Intelligence

Exploiting web 2.0 forallknowledge-based information retrieval

Proceedings of the ACM first Ph.D. workshop in CIKM
Geo-tagging for imprecise regions of different sizes

Proceedings of the 4th ACM workshop on Geographical information retrieval
A knowledge-based search engine powered by wikipedia

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Enhancing text clustering by leveraging Wikipedia semantics

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Building semantic kernels for text classification using wikipedia

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Association thesaurus construction methods based on link co-occurrence analysis for wikipedia

Proceedings of the 17th ACM conference on Information and knowledge management
Is Wikipedia growing a longer tail?

Proceedings of the ACM 2009 international conference on Supporting group work
Quality Evaluation of Search Results by Typicality and Speciality of Terms Extracted from Wikipedia

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
A graph-based approach to mining multilingual word associations from wikipedia

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Building a Text Classifier by a Keyword and Wikipedia Knowledge

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Mining meaning from Wikipedia

International Journal of Human-Computer Studies
Named entity disambiguation by leveraging wikipedia semantic knowledge

Proceedings of the 18th ACM conference on Information and knowledge management
Construction of disambiguated Folksonomy ontologies using Wikipedia

People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
Wikipedia mining for an association web thesaurus construction

WISE'07 Proceedings of the 8th international conference on Web information systems engineering
Structural semantic relatedness: a knowledge-based method to named entity disambiguation

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Adapting recommender systems to the requirements of personal health record systems

Proceedings of the 1st ACM International Health Informatics Symposium
A generative entity-mention model for linking entities with knowledge base

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Collective entity linking in web text: a graph-based method

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Large-scale question classification in cQA by leveraging Wikipedia semantic knowledge

Proceedings of the 20th ACM international conference on Information and knowledge management
Using a lexical dictionary and a folksonomy to automatically construct domain ontologies

AI'11 Proceedings of the 24th international conference on Advances in Artificial Intelligence
Named entity disambiguation based on explicit semantics

SOFSEM'12 Proceedings of the 38th international conference on Current Trends in Theory and Practice of Computer Science
Extracting difference information from multilingual wikipedia

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
An entity-topic model for entity linking

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Language-independent named entity identification using Wikipedia

MM '12 Proceedings of the First Workshop on Multilingual Modeling
Web 2.0, Language Resources and standards to automatically build a multilingual Named Entity Lexicon

Language Resources and Evaluation
An open-source toolkit for mining Wikipedia

Artificial Intelligence
Automatic subject metadata generation for scientific documents using wikipedia and genetic algorithms

EKAW'12 Proceedings of the 18th international conference on Knowledge Engineering and Knowledge Management
Mining a multilingual association dictionary from Wikipedia for cross-language information retrieval

Journal of the American Society for Information Science and Technology
Good quality complementary information for multilingual wikipedia

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms

Journal of Information Science
Improving question retrieval in community question answering using world knowledge

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Domain-specific thesauri are high-cost, high-maintenance, high-value knowledge structures. We show how the classic thesaurus structure of terms and links can be mined automatically from Wikipedia. In a comparison with a professional thesaurus for agriculture we find that Wikipedia contains a substantial proportion of its concepts and semantic relations; furthermore it has impressive coverage of contemporary documents in the domain. Thesauri derived using our techniques capitalize on existing public efforts and tend to reflect contemporary language usage better than their costly, painstakingly-constructed manual counterparts.