An open-source toolkit for mining Wikipedia

Authors:
David Milne;Ian H. Witten
Affiliations:
Computer Science Department, The University of Waikato, Private Bag 3105, Hamilton, New Zealand;Computer Science Department, The University of Waikato, Private Bag 3105, Hamilton, New Zealand
Venue:
Artificial Intelligence
Year:
2013

Citing 22
Cited 4

Placing search in context: the concept revisited

ACM Transactions on Information Systems (TOIS)
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Evaluating WordNet-based Measures of Lexical Semantic Relatedness

Computational Linguistics
The Berkeley DB Book

The Berkeley DB Book
Learning semantic classes for word sense disambiguation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Mining Domain-Specific Thesauri from Wikipedia: A Case Study

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
The Google Similarity Distance

IEEE Transactions on Knowledge and Data Engineering
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
YAGO: A Large Ontology from Wikipedia and WordNet

Web Semantics: Science, Services and Agents on the World Wide Web
Learning to link with wikipedia

Proceedings of the 17th ACM conference on Information and knowledge management
Association thesaurus construction methods based on link co-occurrence analysis for wikipedia

Proceedings of the 17th ACM conference on Information and knowledge management
Clustering Documents Using a Wikipedia-Based Concept Representation

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Mining meaning from Wikipedia

International Journal of Human-Computer Studies
Freebase: a shared database of structured general human knowledge

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Knowledge derived from wikipedia for computing semantic relatedness

Journal of Artificial Intelligence Research
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
DBpedia: a nucleus for a web of open data

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
An approach for extracting bilingual terminology from Wikipedia

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
BabelNet: building a very large multilingual semantic network

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
MENTA: inducing multilingual taxonomies from wikipedia

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Hadoop: The Definitive Guide

Hadoop: The Definitive Guide
Data Mining: Practical Machine Learning Tools and Techniques

Data Mining: Practical Machine Learning Tools and Techniques

Realistic electronic books

International Journal of Human-Computer Studies
Collaboratively built semi-structured content and Artificial Intelligence: The story so far

Artificial Intelligence
Linked data in crowdsourcing purposive social network

Proceedings of the 22nd international conference on World Wide Web companion
Navigating the topical structure of academic search results via the Wikipedia category network

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The online encyclopedia Wikipedia is a vast, constantly evolving tapestry of interlinked articles. For developers and researchers it represents a giant multilingual database of concepts and semantic relations, a potential resource for natural language processing and many other research areas. This paper introduces the Wikipedia Miner toolkit, an open-source software system that allows researchers and developers to integrate Wikipedia@?s rich semantics into their own applications. The toolkit creates databases that contain summarized versions of Wikipedia@?s content and structure, and includes a Java API to provide access to them. Wikipedia@?s articles, categories and redirects are represented as classes, and can be efficiently searched, browsed, and iterated over. Advanced features include parallelized processing of Wikipedia dumps, machine-learned semantic relatedness measures and annotation features, and XML-based web services. Wikipedia Miner is intended to be a platform for sharing data mining techniques.