Approximate searching for distributional similarity

  • Authors:
  • James Gorman;James R. Curran

  • Affiliations:
  • University of Sydney, NSW, Australia;University of Sydney, NSW, Australia

  • Venue:
  • DeepLA '05 Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Distributional similarity requires large volumes of data to accurately represent infrequent words. However, the nearest-neighbour approach to finding synonyms suffers from poor scalability. The Spatial Approximation Sample Hierarchy (SASH), proposed by Houle (2003b), is a data structure for approximate nearest-neighbour queries that balances the efficiency/approximation trade-off. We have intergrated this into an existing distributional similarity system, tripling efficiency with a minor accuracy penalty.