Contextual correlates of synonymy
Communications of the ACM
Placing search in context: the concept revisited
ACM Transactions on Information Systems (TOIS)
New directions in traffic measurement and accounting
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Measuring praise and criticism: Inference of semantic orientation from association
ACM Transactions on Information Systems (TOIS)
Word association norms, mutual information, and lexicography
ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
Finding frequent items in data streams
Theoretical Computer Science - Special issue on automata, languages and programming
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
Data streams: algorithms and applications
Foundations and Trends® in Theoretical Computer Science
Randomized algorithms and NLP: using locality sensitive hash function for high speed noun clustering
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Non-projective dependency parsing using spanning tree algorithms
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Statistical analysis of sketch estimators
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Finding frequent items in data streams
Proceedings of the VLDB Endowment
A uniform approach to analogies, synonyms, antonyms, and associations
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
A study on similarity and relatedness using distributional and WordNet-based approaches
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Streaming for large scale NLP: language modeling
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Probabilistic counting with randomized storage
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Semi-supervised learning of dependency parsers using generalized expectation criteria
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Stream-based randomised language models for SMT
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Web-scale distributional similarity and entity set expansion
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Hash Kernels for Structured Data
The Journal of Machine Learning Research
Stream-based translation models for statistical machine translation
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Online generation of locality sensitive hash signatures
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Sketching techniques for large scale NLP
WAC-6 '10 Proceedings of the NAACL HLT 2010 Sixth Web as Corpus Workshop
Sketch techniques for scaling distributional similarity to the web
GEMS '10 Proceedings of the 2010 Workshop on GEometrical Models of Natural Language Semantics
Using universal linguistic knowledge to guide grammar induction
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Popularity is everything: a new approach to protecting passwords from statistical-guessing attacks
HotSec'10 Proceedings of the 5th USENIX conference on Hot topics in security
Covariance in Unsupervised Learning of Probabilistic Grammars
The Journal of Machine Learning Research
Fast large-scale approximate graph construction for NLP
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Sketch algorithms for estimating point queries in NLP
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Synergistic methods for using language in robotics
Proceedings of the Workshop on Performance Metrics for Intelligent Systems
Indexing Word Sequences for Ranked Retrieval
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
We exploit sketch techniques, especially the Count-Min sketch, a memory, and time efficient framework which approximates the frequency of a word pair in the corpus without explicitly storing the word pair itself. These methods use hashing to deal with massive amounts of streaming text. We apply Count-Min sketch to approximate word pair counts and exhibit their effectiveness on three important NLP tasks. Our experiments demonstrate that on all of the three tasks, we get performance comparable to Exact word pair counts setting and state-of-the-art system. Our method scales to 49 GB of unzipped web data using bounded space of 2 billion counters (8 GB memory).