Sparse distributed memory and related models
Associative neural memories
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Data structures and algorithms for nearest neighbor search in general metric spaces
SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Some approaches to best-match file searching
Communications of the ACM
Similarity estimation techniques from rounding algorithms
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Explorations in Automatic Thesaurus Discovery
Explorations in Automatic Thesaurus Discovery
Discovering word senses from text
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Navigating massive data sets via local clustering
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Fast Approximate Similarity Search in Extremely High-Dimensional Data Sets
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Automatic bilingual lexicon acquisition using random indexing of parallel corpora
Natural Language Engineering
Co-occurrence Retrieval: A Flexible Framework for Lexical Distributional Similarity
Computational Linguistics
Improvements in automatic thesaurus extraction
ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
Randomized algorithms and NLP: using locality sensitive hash function for high speed noun clustering
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Approximate searching for distributional similarity
DeepLA '05 Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Representing words as regions in vector space
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Translation and extension of concepts across languages
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Random indexing using statistical weight functions
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Knowledge derived from wikipedia for computing semantic relatedness
Journal of Artificial Intelligence Research
The topology of synonymy and homonymy networks
CACLA '07 Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition
Comparing Different Properties Involved in Word Similarity Extraction
EPIA '09 Proceedings of the 14th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence
Geo-mining: discovery of road and transport networks using directional patterns
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Enhancement of lexical concepts using cross-lingual web mining
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Web-scale distributional similarity and entity set expansion
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Exemplar-based models for word meaning in context
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Distributional similarity vs. PU learning for entity set expansion
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
From frequency to meaning: vector space models of semantics
Journal of Artificial Intelligence Research
A mixture model with sharing for lexical semantics
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Automatically acquiring a semantic network of related concepts
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Cross-cutting models of lexical semantics
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
SemEval-2012 task 4: evaluating Chinese word similarity
SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Hi-index | 0.00 |
Accurately representing synonymy using distributional similarity requires large volumes of data to reliably represent infrequent words. However, the naïve nearest-neighbour approach to comparing context vectors extracted from large corpora scales poorly (O(n2) in the vocabulary size).In this paper, we compare several existing approaches to approximating the nearest-neighbour search for distributional similarity. We investigate the trade-off between efficiency and accuracy, and find that SASH (Houle and Sakuma, 2005) provides the best balance.