Stream-based randomised language models for SMT

  • Authors:
  • Abby Levenberg;Miles Osborne

  • Affiliations:
  • University of Edinburgh;University of Edinburgh

  • Venue:
  • EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

Randomised techniques allow very big language models to be represented succinctly. However, being batch-based they are unsuitable for modelling an unbounded stream of language whilst maintaining a constant error rate. We present a novel randomised language model which uses an online perfect hash function to efficiently deal with unbounded text streams. Translation experiments over a text stream show that our online randomised model matches the performance of batch-based LMs without incurring the computational overhead associated with full retraining. This opens up the possibility of randomised language models which continuously adapt to the massive volumes of texts published on the Web each day.