Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Similarity estimation techniques from rounding algorithms
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Randomized algorithms and NLP: using locality sensitive hash function for high speed noun clustering
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Asymmetric distance estimation with sketches for similarity search in high-dimensional spaces
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Streaming for large scale NLP: language modeling
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Succinct approximate counting of skewed data
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Probabilistic counting with randomized storage
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Stream-based randomised language models for SMT
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Streaming first story detection with application to Twitter
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Efficient online locality sensitive hashing via reservoir counting
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Paraphrastic sentence compression with a character-based metric: tightening without deletion
MTTG '11 Proceedings of the Workshop on Monolingual Text-To-Text Generation
Reranking bilingually extracted paraphrases using monolingual distributional similarity
GEMS '11 Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics
Approximate scalable bounded space sketch for large data NLP
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Learning bilingual lexicons using the visual similarity of labeled web images
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Using paraphrases for improving first story detection in news and Twitter
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Space efficiencies in discourse modeling via conditional random sampling
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Monolingual distributional similarity for text-to-text generation
SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Streaming analysis of discourse participants
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Fast large-scale approximate graph construction for NLP
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Sketch algorithms for estimating point queries in NLP
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Hi-index | 0.00 |
Motivated by the recent interest in streaming algorithms for processing large text collections, we revisit the work of Ravichandran et al. (2005) on using the Locality Sensitive Hash (LSH) method of Charikar (2002) to enable fast, approximate comparisons of vector cosine similarity. For the common case of feature updates being additive over a data stream, we show that LSH signatures can be maintained online, without additional approximation error, and with lower memory requirements than when using the standard offline technique.