Co-occurrence Retrieval: A Flexible Framework for Lexical Distributional Similarity

Authors:
Julie Weeds;David Weir
Affiliations:
-;-
Venue:
Computational Linguistics
Year:
2005

Citing 39
Cited 36

Information retrieval: data structures and algorithms

Information retrieval: data structures and algorithms
Elements of information theory

Elements of information theory
Dimensions of meaning

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Class-based n-gram models of natural language

Computational Linguistics
The generative lexicon

Computational Linguistics
Selection and information: a class-based approach to lexical relationships

Selection and information: a class-based approach to lexical relationships
Query expansion using local and global document analysis

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Similarity-Based Models of Word Cooccurrence Probabilities

Machine Learning - Special issue on natural language learning
A Winnow-Based Approach to Context-Sensitive Spelling Correction

Machine Learning - Special issue on natural language learning
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Information Retrieval

Information Retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
A Technical Word- and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups

Machine Translation
Class-based probability estimation using a semantic hierarchy

Computational Linguistics
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Structural ambiguity and lexical relations

Computational Linguistics - Special issue on using large corpora: I
Automatic word sense discrimination

Computational Linguistics - Special issue on word sense disambiguation
Generalizing case frames using a thesaurus and the MDL principle

Computational Linguistics
Word clustering and disambiguation based on co-occurrence data

Natural Language Engineering
Word-for-word glossing with contextually similar words

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
A classification approach to word prediction

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Using syntactic dependency as local context to resolve word sense ambiguity

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Contextual word similarity and estimation from sparse data

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Word association norms, mutual information, and lexicography

ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
Noun classification from predicate-argument structures

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
A class-based probabilistic approach to structural disambiguation

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Measures of distributional similarity

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Inducing a semantically annotated lexicon via EM-based clustering

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Automatic construction of a hypernym-labeled noun hierarchy from text

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Using a semantic concordance for sense identification

HLT '94 Proceedings of the workshop on Human Language Technology
Improvements in automatic thesaurus extraction

ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
A general framework for distributional similarity

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Finding predominant word senses in untagged text

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Feature vector quality and distributional similarity

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Characterising measures of lexical distributional similarity

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Identifying synonyms among distributionally similar words

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Using measures of semantic relatedness for word sense disambiguation

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing

Evaluating WordNet-based Measures of Lexical Semantic Relatedness

Computational Linguistics
Scaling distributional similarity to large corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A context-sensitive framework for lexical ontologies

The Knowledge Engineering Review
Classification-Based Filtering of Semantic Relatedness in Hypernymy Extraction

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Learning semantic relatedness from term discrimination information

Expert Systems with Applications: An International Journal
An efficient algorithm for building a distributional thesaurus (and other Sketch Engine developments)

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Rank-Based Transformation in Measuring Semantic Relatedness

Canadian AI '09 Proceedings of the 22nd Canadian Conference on Artificial Intelligence: Advances in Artificial Intelligence
Measuring topic homogeneity and its application to dictionary-based word sense disambiguation

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Using web-search results to measure word-group similarity

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Semantic classification with distributional kernels

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Discriminative learning of selectional preference from unlabeled text

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Bootstrapping distributional feature vector quality

Computational Linguistics
Deriving a large scale taxonomy from Wikipedia

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Knowledge derived from wikipedia for computing semantic relatedness

Journal of Artificial Intelligence Research
Graph-based clustering for semantic classification of onomatopoetic words

TextGraphs-3 Proceedings of the 3rd Textgraphs Workshop on Graph-Based Algorithms for Natural Language Processing
Combining syntactic co-occurrences and nearest neighbours in distributional methods to remedy data sparseness

UMSLLS '09 Proceedings of the Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics
Automatically creating datasets for measures of semantic relatedness

LD '06 Proceedings of the Workshop on Linguistic Distances
Weakly supervised techniques for domain-independent sentiment classification

Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion
Relieving Polysemy Problem for Synonymy Detection

EPIA '09 Proceedings of the 14th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence
Using distributional similarity to identify individual verb choice

INLG '06 Proceedings of the Fourth International Natural Language Generation Conference
Automatic selection of heterogeneous syntactic features in semantic similarity of polish nouns

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
A comparison of co-occurrence and similarity measures as simulations of context

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Order retrieval

LKR'08 Proceedings of the 3rd international conference on Large-scale knowledge resources: construction and application
Bootstrapping semantic analyzers from non-contradictory texts

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Parsing and real-world applications

TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Disclosure control of natural language information to enable secure and enjoyable communication over the internet

Proceedings of the 15th international conference on Security protocols
Taxonomy induction based on a collaboratively built knowledge repository

Artificial Intelligence
Creative language retrieval: a robust hybrid of information retrieval and linguistic creativity

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A supervised method of feature weighting for measuring semantic relatedness

Canadian AI'11 Proceedings of the 24th Canadian conference on Advances in artificial intelligence
A domain-independent approach to finding related entities

Information Processing and Management: an International Journal
Lexical acquisition for clinical text mining using distributional similarity

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
A semi-supervised approach to extracting multiword entity names from user reviews

Proceedings of the 1st Joint International Workshop on Entity-Oriented and Semantic Search
A first approach to CLIR using character n-grams alignment

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
Using Patterns Co-occurrence Matrix for Cleaning Closed Sequential Patterns for Text Mining

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Predicting part-of-speech tags and morpho-syntactic relations using similarity-based technique

SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing
Large, huge or gigantic? Identifying and encoding intensity relations among adjectives in WordNet

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Techniques that exploit knowledge of distributional similarity between words have been proposed in many areas of Natural Language Processing. For example, in language modeling, the sparse data problem can be alleviated by estimating the probabilities of unseen co-occurrences of events from the probabilities of seen co-occurrences of similar events. In other applications, distributional similarity is taken to be an approximation to semantic similarity. However, due to the wide range of potential applications and the lack of a strict definition of the concept of distributional similarity, many methods of calculating distributional similarity have been proposed or adopted.In this work, a flexible, parameterized framework for calculating distributional similarity is proposed. Within this framework, the problem of finding distributionally similar words is cast as one of co-occurrence retrieval (CR) for which precision and recall can be measured by analogy with the way they are measured in document retrieval. As will be shown, a number of popular existing measures of distributional similarity are simulated with parameter settings within the CR framework. In this article, the CR framework is then used to systematically investigate three fundamental questions concerning distributional similarity. First, is the relationship of lexical similarity necessarily symmetric, or are there advantages to be gained from considering it as an asymmetric relationship? Second, are some co-occurrences inherently more salient than others in the calculation of distributional similarity? Third, is it necessary to consider the difference in the extent to which each word occurs in each co-occurrence type?Two application-based tasks are used for evaluation: automatic thesaurus generation and pseudo-disambiguation. It is possible to achieve significantly better results on both these tasks by varying the parameters within the CR framework rather than using other existing distributional similarity measures; it will also be shown that any single unparameterized measure is unlikely to be able to do better on both tasks. This is due to an inherent asymmetry in lexical substitutability and therefore also in lexical distributional similarity.