Investigating content selection for language generation using machine learning

  • Authors:
  • Colin Kelly;Ann Copestake;Nikiforos Karamanis

  • Affiliations:
  • University of Cambridge, Cambridge, UK;University of Cambridge, Cambridge, UK;Trinity College Dublin, Ireland

  • Venue:
  • ENLG '09 Proceedings of the 12th European Workshop on Natural Language Generation
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The content selection component of a natural language generation system decides which information should be communicated in its output. We use information from reports on the game of cricket. We first describe a simple factoid-to-text alignment algorithm then treat content selection as a collective classification problem and demonstrate that simple 'grouping' of statistics at various levels of granularity yields substantially improved results over a probabilistic baseline. We additionally show that holding back of specific types of input data, and linking database structures with commonality further increase performance.