Automatic acquisition of taxonomies in different languages from multiple Wikipedia versions

Authors:
Renato Domínguez García;Christoph Rensing;Ralf Steinmetz
Affiliations:
Multimedia Communications Lab TU Darmstadt, Darmstadt, Germany;Multimedia Communications Lab TU Darmstadt, Darmstadt, Germany;Multimedia Communications Lab TU Darmstadt, Darmstadt, Germany
Venue:
i-KNOW '11 Proceedings of the 11th International Conference on Knowledge Management and Knowledge Technologies
Year:
2011

Citing 9
Cited 0

WordNet: a lexical database for English

Communications of the ACM
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
Mining meaning from Wikipedia

International Journal of Human-Computer Studies
Deriving a large scale taxonomy from Wikipedia

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Hypernym discovery based on distributional similarity and hierarchical structures

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Open information extraction using Wikipedia

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
BabelNet: building a very large multilingual semantic network

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Learning word-class lattices for definition and hypernym extraction

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
MENTA: inducing multilingual taxonomies from wikipedia

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the last years, the vision of the Semantic Web has led to many approaches that aim to automatically derive knowledge bases from Wikipedia. These approaches rely mostly on the English Wikipedia as it is the largest Wikipedia version and have lead to valuable knowledge bases. However, each Wikipedia version contains socio-cultural knowledge, i.e. knowledge with specific relevance for a culture or language. One difficulty of the application of existing approaches to multiple Wikipedia versions is the use of additional corpora. In this paper, we describe the adaptation of existing heuristics that make the extraction of large sets of hyponymy relations from multiple Wikipedia versions with little information about each language possible. Further, we evaluate our approach with Wikipedia versions in four different languages and compare results with GermaNet for German and WordNet for English.