Phrase-based statistical language generation using graphical models and active learning

  • Authors:
  • François Mairesse;Milica Gašić;Filip Jurčíček;Simon Keizer;Blaise Thomson;Kai Yu;Steve Young

  • Affiliations:
  • Cambridge University, Cambridge, UK;Cambridge University, Cambridge, UK;Cambridge University, Cambridge, UK;Cambridge University, Cambridge, UK;Cambridge University, Cambridge, UK;Cambridge University, Cambridge, UK;Cambridge University, Cambridge, UK

  • Venue:
  • ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most previous work on trainable language generation has focused on two paradigms: (a) using a statistical model to rank a set of generated utterances, or (b) using statistics to inform the generation decision process. Both approaches rely on the existence of a handcrafted generator, which limits their scalability to new domains. This paper presents Bagel, a statistical language generator which uses dynamic Bayesian networks to learn from semantically-aligned data produced by 42 untrained annotators. A human evaluation shows that Bagel can generate natural and informative utterances from unseen inputs in the information presentation domain. Additionally, generation performance on sparse datasets is improved significantly by using certainty-based active learning, yielding ratings close to the human gold standard with a fraction of the data.