Fast computation of lexical affinity models

Authors:
Egidio Terra;Charles L. A. Clarke
Affiliations:
University of Waterloo, Canada;University of Waterloo, Canada
Venue:
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Year:
2004

Citing 13
Cited 4

Similarity-Based Models of Word Cooccurrence Probabilities

Machine Learning - Special issue on natural language learning
The impact of corpus size on question answering performance

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Query Expansion with Long-Span Collocates

Information Retrieval
Modelling Word-Pair Relations in a Category-Based Language Model

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Discovery of linguistic relations using lexical attraction

Discovery of linguistic relations using lexical attraction
Using the web to obtain frequencies for unseen bigrams

Computational Linguistics - Special issue on web as corpus
A model of lexical attraction and repulsion

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
The computation of word associations: comparing syntagmatic and paradigmatic approaches

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Using collocations for topic segmentation and link detection

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Measuring the similarity between compound nouns in different languages using non-parallel corpora

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
A measure of term representativeness based on the number of co-occurring salient words

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Frequency estimates for statistical word similarity measures

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A general framework for distributional similarity

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing

Scoring missing terms in information retrieval tasks

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Co-dispersion: a windowless approach to lexical association

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
A comparison of windowless and window-based computational association measures as predictors of syntagmatic human associations

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Expectation vectors: a semiotics inspired approach to geometric lexical-semantic representation

GEMS '10 Proceedings of the 2010 Workshop on GEometrical Models of Natural Language Semantics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a framework for the fast computation of lexical affinity models. The framework is composed of a novel algorithm to efficiently compute the co-occurrence distribution between pairs of terms, an independence model, and a parametric affinity model. In comparison with previous models, which either use arbitrary windows to compute similarity between words or use lexical affinity to create sequential models, in this paper we focus on models intended to capture the co-occurrence patterns of any pair of words or phrases at any distance in the corpus. The framework is flexible, allowing fast adaptation to applications and it is scalable. We apply it in combination with a terabyte corpus to answer natural language tests, achieving encouraging results.