Counting large numbers of events in small registers
Communications of the ACM
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Data streams: algorithms and applications
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Universal classes of hash functions (Extended Abstract)
STOC '77 Proceedings of the ninth annual ACM symposium on Theory of computing
The Bloomier filter: an efficient data structure for static support lookup tables
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
On dynamic range reporting in one dimension
Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
Minimum error rate training in statistical machine translation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Domain adaptation for statistical machine translation with domain dictionary and monolingual corpora
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Streaming for large scale NLP: language modeling
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Experiments in domain adaptation for statistical machine translation
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Probabilistic counting with randomized storage
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Streaming first story detection with application to Twitter
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Stream-based translation models for statistical machine translation
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Online generation of locality sensitive hash signatures
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
An information-retrieval approach to language modeling: applications to social data
WSA '10 Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media
LetsMT! --Online Platform for Sharing Training Data and Building User Tailored Machine Translation
Proceedings of the 2010 conference on Human Language Technologies -- The Baltic Perspective: Proceedings of the Fourth International Conference Baltic HLT 2010
Sketching techniques for large scale NLP
WAC-6 '10 Proceedings of the NAACL HLT 2010 Sixth Web as Corpus Workshop
Sketch techniques for scaling distributional similarity to the web
GEMS '10 Proceedings of the 2010 Workshop on GEometrical Models of Natural Language Semantics
Faster and smaller N-gram language models
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Smoothing techniques for adaptive online language models: topic tracking in tweet streams
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Multiple-stream language models for statistical machine translation
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Approximate scalable bounded space sketch for large data NLP
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Space efficiencies in discourse modeling via conditional random sampling
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
LetsMT!: a cloud-based platform for do-it-yourself machine translation
ACL '12 Proceedings of the ACL 2012 System Demonstrations
Streaming analysis of discourse participants
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Sketch algorithms for estimating point queries in NLP
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Hi-index | 0.01 |
Randomised techniques allow very big language models to be represented succinctly. However, being batch-based they are unsuitable for modelling an unbounded stream of language whilst maintaining a constant error rate. We present a novel randomised language model which uses an online perfect hash function to efficiently deal with unbounded text streams. Translation experiments over a text stream show that our online randomised model matches the performance of batch-based LMs without incurring the computational overhead associated with full retraining. This opens up the possibility of randomised language models which continuously adapt to the massive volumes of texts published on the Web each day.