Wisdom of crowds versus wisdom of linguists – measuring the semantic relatedness of words

Authors:
Torsten Zesch;Iryna Gurevych
Affiliations:
Ubiquitous knowledge processing lab, computer science department, technische universität darmstadt, hochschulstr. 10, 64289 darmstadt, germany e-mail: zesch@tk.informatik.tu-darmstadt.de, gur ...;Ubiquitous knowledge processing lab, computer science department, technische universität darmstadt, hochschulstr. 10, 64289 darmstadt, germany e-mail: zesch@tk.informatik.tu-darmstadt.de, gur ...
Venue:
Natural Language Engineering
Year:
2010

Citing 27
Cited 20

Concept based query expansion

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone

SIGDOC '86 Proceedings of the 5th annual international conference on Systems documentation
Contextual correlates of synonymy

Communications of the ACM
Placing search in context: the concept revisited

ACM Transactions on Information Systems (TOIS)
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Efficiently computed lexical chains as an intermediate representation for automatic text summarization

Computational Linguistics - Summarization
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources

IEEE Transactions on Knowledge and Data Engineering
Lexical cohesion computed by thesaural relations as an indicator of the structure of text

Computational Linguistics
Similarity between words computed by spreading activation on an English dictionary

EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
Verbs semantics and lexical selection

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
A method for word sense disambiguation of unrestricted text

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Evaluating WordNet-based Measures of Lexical Semantic Relatedness

Computational Linguistics
A semantic approach to IE pattern induction

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Expressing implicit semantic relations without supervision

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Semantic similarity applied to spoken dialogue summarization

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Non-classical lexical semantic relations

CLS '04 Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics
WikiRelate! computing semantic relatedness using wikipedia

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Comparing Wikipedia and German wordnet by evaluating semantic relatedness on multiple datasets

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Using wiktionary for computing semantic relatedness

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Improving word sense disambiguation in lexical chaining

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Automatically creating datasets for measures of semantic relatedness

LD '06 Proceedings of the Workshop on Linguistic Distances
Using measures of semantic relatedness for word sense disambiguation

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Using the structure of a conceptual network in computing semantic relatedness

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Semi-automatic endogenous enrichment of collaboratively constructed lexical resources: piggybacking onto wiktionary

IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
A word at a time: computing word relatedness using temporal semantic analysis

Proceedings of the 20th international conference on World wide web
The smart, the intelligent and the wise: roles and values of interactive technologies

Proceedings of the First International Conference on Intelligent Interactive Technologies and Multimedia
Using properties to compare both words and clauses

KES-AMSTA'11 Proceedings of the 5th KES international conference on Agent and multi-agent systems: technologies and applications
Gauging the internet doctor: ranking medical claims based on community knowledge

Proceedings of the 2011 workshop on Data mining for medicine and healthcare
Harnessing different knowledge sources to measure semantic relatedness under a uniform model

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Semantic processing of database textual attributes using wikipedia

FQAS'11 Proceedings of the 9th international conference on Flexible Query Answering Systems
Evaluating PageRank methods for structural sense ranking in labeled tree data

Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
Large-scale learning of word relatedness with constraints

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Explanatory semantic relatedness and explicit spatialization for exploratory search

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
LODDO: using linked open data description overlap to measure semantic relatedness between named entities

JIST'11 Proceedings of the 2011 joint international conference on The Semantic Web
Measuring contextual fitness using error contexts extracted from the Wikipedia revision history

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Exploring dictionary-based semantic relatedness in labeled tree data

Information Sciences: an International Journal
On the connections between explicit semantic analysis and latent semantic analysis

Proceedings of the 21st ACM international conference on Information and knowledge management
Semantic-preservingword clouds by seam carving

EuroVis'11 Proceedings of the 13th Eurographics / IEEE - VGTC conference on Visualization
Combining language sources and robust semantic relatedness for attribute-based knowledge transfer

ECCV'10 Proceedings of the 11th European conference on Trends and Topics in Computer Vision - Volume Part I
Semi-automatic enrichment of crowdsourced synonymy networks: the WISIGOTH system applied to Wiktionary

Language Resources and Evaluation
Fusing distributional and experiential information for measuring semantic relatedness

Information Fusion
Evaluating the results of methods for computing semantic relatedness

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Extracting semantic knowledge from Wikipedia category names

Proceedings of the 2013 workshop on Automated knowledge base construction

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this article, we present a comprehensive study aimed at computing semantic relatedness of word pairs. We analyze the performance of a large number of semantic relatedness measures proposed in the literature with respect to different experimental conditions, such as (i) the datasets employed, (ii) the language (English or German), (iii) the underlying knowledge source, and (iv) the evaluation task (computing scores of semantic relatedness, ranking word pairs, solving word choice problems). To our knowledge, this study is the first to systematically analyze semantic relatedness on a large number of datasets with different properties, while emphasizing the role of the knowledge source compiled either by the ‘wisdom of linguists’ (i.e., classical wordnets) or by the ‘wisdom of crowds’ (i.e., collaboratively constructed knowledge sources like Wikipedia). The article discusses benefits and drawbacks of different approaches to evaluating semantic relatedness. We show that results should be interpreted carefully to evaluate particular aspects of semantic relatedness. For the first time, we employ a vector based measure of semantic relatedness, relying on a concept space built from documents, to the first paragraph of Wikipedia articles, to English WordNet glosses, and to GermaNet based pseudo glosses. Contrary to previous research (Strube and Ponzetto 2006; Gabrilovich and Markovitch 2007; Zesch et al. 2007), we find that ‘wisdom of crowds’ based resources are not superior to ‘wisdom of linguists’ based resources. We also find that using the first paragraph of a Wikipedia article as opposed to the whole article leads to better precision, but decreases recall. Finally, we present two systems that were developed to aid the experiments presented herein and are freely available 1 for research purposes: (i) DEXTRACT, a software to semi-automatically construct corpus-driven semantic relatedness datasets, and (ii) JWPL, a Java-based high-performance Wikipedia Application Programming Interface (API) for building natural language processing (NLP) applications.