Worth its weight in gold or yet another resource — a comparative study of wiktionary, openthesaurus and germanet

Authors:
Christian M. Meyer;Iryna Gurevych
Affiliations:
Ubiquitous Knowledge Processing Lab, Technische Universität Darmstadt, Darmstadt, Germany;Ubiquitous Knowledge Processing Lab, Technische Universität Darmstadt, Darmstadt, Germany
Venue:
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Year:
2010

Citing 7
Cited 3

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Statistical mechanics of complex networks

Statistical mechanics of complex networks
Inter-coder agreement for computational linguistics

Computational Linguistics
Using wiktionary for computing semantic relatedness

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Wiktionary and NLP: improving synonymy networks

People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources

Semi-automatic endogenous enrichment of collaboratively constructed lexical resources: piggybacking onto wiktionary

IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
Uby: a large-scale unified lexical-semantic resource based on LMF

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Semi-automatic enrichment of crowdsourced synonymy networks: the WISIGOTH system applied to Wiktionary

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we analyze the topology and the content of a range of lexical semantic resources for the German language constructed either in a controlled (GermaNet), semi-controlled (OpenThesaurus), or collaborative, i.e. community-based, manner (Wiktionary). For the first time, the comparison of the corresponding resources is performed at the word sense level. For this purpose, the word senses of terms are automatically disambiguated in Wiktionary and the content of all resources is converted to a uniform representation. We show that the resources' topology is well comparable as they share the small world property and contain a comparable number of entries, although differences in their connectivity exist. Our study of content related properties reveals that the German Wiktionary has a different distribution of word senses and contains more polysemous entries than both other resources. We identify that each resource contains the highest number of a particular type of semantic relation. We finally increase the number of relations in Wiktionary by considering symmetric and inverse relations that have been found to be usually absent in this resource.