Natural language generation using an information-slim representation

  • Authors:
  • Daniel Marcu;Radu Soricut

  • Affiliations:
  • University of Southern California;University of Southern California

  • Venue:
  • Natural language generation using an information-slim representation
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this dissertation, I propose a new natural language generation paradigm, based on direct transformation of textual information into well-formed textual output. I support this language generation paradigm with theoretical contributions in the field of formal languages, new algorithms, empirical results, and software implementations. At the core of this work is a novel representation formalism for probability distributions over finite languages. Due to its convenient representation and computational properties, this formalism supports a wide range of language generation needs, from sentence realization to text planning. Based on this formalism, I describe, implement, and analyze theoretically a family of algorithms that perform language generation using direct transformations of text. These algorithms use stochastic models of language to drive the generation process. I perform extensive empirical evaluations using my implementation of these algorithms. These evaluations show state-of-the-art performance in automatic translation, and significant improvements in state-of-the-art performance in abstractive headline generation and coherent discourse generation.