Dependency Analysis and CBR to Bridge the Generation Gap in Template-Based NLG

  • Authors:
  • Virginia Francisco;Raquel Hervás;Pablo Gervás

  • Affiliations:
  • Departamento de Ingeniería del Software e Inteligencia Artificial, Universidad Complutense de Madrid, Spain;Departamento de Ingeniería del Software e Inteligencia Artificial, Universidad Complutense de Madrid, Spain;Departamento de Ingeniería del Software e Inteligencia Artificial, Universidad Complutense de Madrid, Spain

  • Venue:
  • CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The present paper describes how dependency analysis can be used to automatically extract from a corpus a set of cases - and an accompanying vocabulary - which enable a template-based generator to achieve reasonable coverage over conceptual messages beyond the explicit scope of the templates defined in it. Details are provided on the actual process of partial automation that has been applied to obtain the case base, together with the various ingredients of the template-based generator, which applies case-based reasoning techniques. This module resorts to the taxonomy of concepts in WordNet to compute similarity between concepts involved in the texts. A case retrieval net is used as a memory model. The set of data to be converted into text acts as a query to the system. The process of solving a given query may involve several retrieval processes - to obtain a set of cases that together constitute a good solution for transcribing the data in the query as text messages - and a process of knowledge-intensive adaptation which resorts to a knowledge base to identify appropriate substitutions and completions for the concepts that appear in the cases, using the query as a source. We describe this case-based solution for selecting an appropriate set of templates to render a given set of data as text, we present numeric results of system performance in the domain of press articles, and we discuss its advantages and shortcomings.