Fast computation of lexical affinity models

  • Authors:
  • Egidio Terra;Charles L. A. Clarke

  • Affiliations:
  • University of Waterloo, Canada;University of Waterloo, Canada

  • Venue:
  • COLING '04 Proceedings of the 20th international conference on Computational Linguistics
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a framework for the fast computation of lexical affinity models. The framework is composed of a novel algorithm to efficiently compute the co-occurrence distribution between pairs of terms, an independence model, and a parametric affinity model. In comparison with previous models, which either use arbitrary windows to compute similarity between words or use lexical affinity to create sequential models, in this paper we focus on models intended to capture the co-occurrence patterns of any pair of words or phrases at any distance in the corpus. The framework is flexible, allowing fast adaptation to applications and it is scalable. We apply it in combination with a terabyte corpus to answer natural language tests, achieving encouraging results.