Automatically creating datasets for measures of semantic relatedness

Authors:
Torsten Zesch;Iryna Gurevych
Affiliations:
Darmstadt University of Technology, Darmstadt, Germany;Darmstadt University of Technology, Darmstadt, Germany
Venue:
LD '06 Proceedings of the Workshop on Linguistic Distances
Year:
2006

Citing 13
Cited 8

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone

SIGDOC '86 Proceedings of the 5th annual international conference on Systems documentation
Contextual correlates of synonymy

Communications of the ACM
Placing search in context: the concept revisited

ACM Transactions on Information Systems (TOIS)
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Verbs semantics and lexical selection

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Co-occurrence Retrieval: A Flexible Framework for Lexical Distributional Similarity

Computational Linguistics
Evaluating WordNet-based Measures of Lexical Semantic Relatedness

Computational Linguistics
Identifying semantic relations and functional properties of human verb associations

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Non-classical lexical semantic relations

CLS '04 Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Using measures of semantic relatedness for word sense disambiguation

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Using the structure of a conceptual network in computing semantic relatedness

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Comparing Wikipedia and German wordnet by evaluating semantic relatedness on multiple datasets

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Wikipedia-based semantic interpretation for natural language processing

Journal of Artificial Intelligence Research
How well do semantic relatedness measures perform?: a meta-study

STEP '08 Proceedings of the 2008 Conference on Semantics in Text Processing
Wisdom of crowds versus wisdom of linguists – measuring the semantic relatedness of words

Natural Language Engineering
Automatic acquisition of wordnet relations by distributionally supported morphological patterns extracted from Polish corpora

TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Explanatory semantic relatedness and explicit spatialization for exploratory search

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Constructing and utilizing wordnets using statistical methods

Language Resources and Evaluation
Evaluating the results of methods for computing semantic relatedness

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Semantic relatedness is a special form of linguistic distance between words. Evaluating semantic relatedness measures is usually performed by comparison with human judgments. Previous test datasets had been created analytically and were limited in size. We propose a corpus-based system for automatically creating test datasets. Experiments with human subjects show that the resulting datasets cover all degrees of relatedness. As a result of the corpus-based approach, test datasets cover all types of lexical-semantic relations and contain domain-specific words naturally occurring in texts.