Corpus-based methods in natural language generation: friend or foe?

  • Authors:
  • Owen Rambow

  • Affiliations:
  • AT&T Labs -- Research, Florham Park, NJ

  • Venue:
  • EWNLG '01 Proceedings of the 8th European workshop on Natural Language Generation - Volume 8
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

In computational linguistics, the 1990s were characterized by the rapid rise to prominence of corpus-based methods in natural language understanding (NLU). These methods include statistical and machine-learning and approaches. In natural language generation (NLG), in the mean time, there was little work using statistical and machine learning approaches. Some researchers felt that the kind of ambiguities that appeared to profit from corpus-based approaches in NLU did not exist in NLG: if the input is adequately specified, then all the rules that map to a correct output can also be explicitly specified. However, this paper will argue that this view is not correct, and NLG can and does profit from corpus-based methods. The resistance to corpus-based approaches in NLG may have more to do with the fact that in many NLG applications (such as report or description generation) the output to be generated is extremely limited. As is the case with NLU, if the language is limited, hand-crafted methods are adequate and successful. Thus, it is not a surprise that the first use of corpus-based techniques, at ISI (Knight and Hatzivassiloglou, 1995; Langkilde and Knight, 1998) was motivated by the use of NLG not in "traditional" NLG applications, but in machine translation, in which the range of output language is (potentially) much larger.