Class-based n-gram models of natural language
Computational Linguistics
ACM Computing Surveys (CSUR)
Explorations in Automatic Thesaurus Discovery
Explorations in Automatic Thesaurus Discovery
Navigating massive data sets via local clustering
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Class-based probability estimation using a semantic hierarchy
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Robust, applied morphological generation
INLG '00 Proceedings of the first international conference on Natural language generation - Volume 14
Improvements in automatic thesaurus extraction
ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
Scaling distributional similarity to large corpora
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Automatically extracting nominal mentions of events with a bootstrapped probabilistic classifier
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Hi-index | 0.00 |
Distributional similarity requires large volumes of data to accurately represent infrequent words. However, the nearest-neighbour approach to finding synonyms suffers from poor scalability. The Spatial Approximation Sample Hierarchy (SASH), proposed by Houle (2003b), is a data structure for approximate nearest-neighbour queries that balances the efficiency/approximation trade-off. We have intergrated this into an existing distributional similarity system, tripling efficiency with a minor accuracy penalty.