Stream-based randomised language models for SMT

Authors:
Abby Levenberg;Miles Osborne
Affiliations:
University of Edinburgh;University of Edinburgh
Venue:
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Year:
2009

Citing 11
Cited 15

Counting large numbers of events in small registers

Communications of the ACM
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Data streams: algorithms and applications

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Universal classes of hash functions (Extended Abstract)

STOC '77 Proceedings of the ninth annual ACM symposium on Theory of computing
The Bloomier filter: an efficient data structure for static support lookup tables

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
On dynamic range reporting in one dimension

Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Domain adaptation for statistical machine translation with domain dictionary and monolingual corpora

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Streaming for large scale NLP: language modeling

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Experiments in domain adaptation for statistical machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Probabilistic counting with randomized storage

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence

Streaming first story detection with application to Twitter

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Stream-based translation models for statistical machine translation

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Online generation of locality sensitive hash signatures

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
An information-retrieval approach to language modeling: applications to social data

WSA '10 Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media
LetsMT! --Online Platform for Sharing Training Data and Building User Tailored Machine Translation

Proceedings of the 2010 conference on Human Language Technologies -- The Baltic Perspective: Proceedings of the Fourth International Conference Baltic HLT 2010
Sketching techniques for large scale NLP

WAC-6 '10 Proceedings of the NAACL HLT 2010 Sixth Web as Corpus Workshop
Sketch techniques for scaling distributional similarity to the web

GEMS '10 Proceedings of the 2010 Workshop on GEometrical Models of Natural Language Semantics
Faster and smaller N-gram language models

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Smoothing techniques for adaptive online language models: topic tracking in tweet streams

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Multiple-stream language models for statistical machine translation

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Approximate scalable bounded space sketch for large data NLP

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Space efficiencies in discourse modeling via conditional random sampling

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
LetsMT!: a cloud-based platform for do-it-yourself machine translation

ACL '12 Proceedings of the ACL 2012 System Demonstrations
Streaming analysis of discourse participants

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Sketch algorithms for estimating point queries in NLP

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.01

Visualization

Abstract

Randomised techniques allow very big language models to be represented succinctly. However, being batch-based they are unsuitable for modelling an unbounded stream of language whilst maintaining a constant error rate. We present a novel randomised language model which uses an online perfect hash function to efficiently deal with unbounded text streams. Translation experiments over a text stream show that our online randomised model matches the performance of batch-based LMs without incurring the computational overhead associated with full retraining. This opens up the possibility of randomised language models which continuously adapt to the massive volumes of texts published on the Web each day.