A comparison of co-occurrence and similarity measures as simulations of context

Authors:
Stefan Bordag
Affiliations:
Natural Language Processing Department, University of Leipzig
Venue:
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Year:
2008

Citing 12
Cited 7

Translating collocations for bilingual lexicons: a statistical approach

Computational Linguistics
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Explorations in Automatic Thesaurus Discovery

Explorations in Automatic Thesaurus Discovery
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Retrieving collocations from text: Xtract

Computational Linguistics - Special issue on using large corpora: I
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Noun-phrase co-occurrence statistics for semiautomatic semantic lexicon construction

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Measures of distributional similarity

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Frequency estimates for statistical word similarity measures

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Unsupervised methods for developing taxonomies by combining syntactic and statistical information

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Co-occurrence Retrieval: A Flexible Framework for Lexical Distributional Similarity

Computational Linguistics
Towards full automation of lexicon construction

CLS '04 Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics

Large-scale computation of distributional similarities for queries

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Comparing Different Properties Involved in Word Similarity Extraction

EPIA '09 Proceedings of the 14th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence
Need for computer-assisted qualitative data analysis in the strategic planning of e-government research

Proceedings of the 11th Annual International Digital Government Research Conference on Public Administration Online: Challenges and Opportunities
Is singular value decomposition useful for word similarity extraction?

Language Resources and Evaluation
Clustering by usage: higher order co-occurrences of learning objects

Proceedings of the 2nd International Conference on Learning Analytics and Knowledge
A new collaborative filtering approach for increasing the aggregate diversity of recommender systems

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Unsupervised segmentation for different types of morphological processes using multiple sequence alignment

SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Observations of word co-occurrences and similarity computations are often used as a straightforward way to represent the global contexts of words and achieve a simulation of semantic word similarity for applications such as word or document clustering and collocation extraction. Despite the simplicity of the underlying model, it is necessary to select a proper significance, a similarity measure and a similarity computation algorithm. However, it is often unclear how the measures are related and additionally often dimensionality reduction is applied to enable the efficient computation of the word similarity. This work presents a linear time complexity approximative algorithm for computing word similarity without any dimensionality reduction. It then introduces a large-scale evaluation based on two languages and two knowledge sources and discusses the underlying reasons for the relative performance of each measure.