Dependency-based n-gram models for general purpose sentence realisation

  • Authors:
  • Yuqing Guo;Haifeng Wang;Josef Van genabith

  • Affiliations:
  • Toshiba (china) research and development center 5/f., tower w2, oriental plaza, dongcheng district, beijing, 100738, china e-mail: guoyuqing@rdc.toshiba.com.cn;Baidu, inc., baidu campus, no. 10, shangdi 10th street, haidian district, beijing, 100085, china e-mail: wanghaifeng@baidu.com;Nclt/cngl, school of computing, dublin city university glasnevin, dublin 9, ireland e-mail: josef@computing.dcu.ie

  • Venue:
  • Natural Language Engineering
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a general-purpose, wide-coverage, probabilistic sentence generator based on dependency n-gram models. This is particularly interesting as many semantic or abstract syntactic input specifications for sentence realisation can be represented as labelled bi-lexical dependencies or typed predicate-argument structures. Our generation method captures the mapping between semantic representations and surface forms by linearising a set of dependencies directly, rather than via the application of grammar rules as in more traditional chart-style or unification-based generators. In contrast to conventional n-gram language models over surface word forms, we exploit structural information and various linguistic features inherent in the dependency representations to constrain the generation space and improve the generation quality. A series of experiments shows that dependency-based n-gram models generalise well to different languages (English and Chinese) and representations (LFG and CoNLL). Compared with state-of-the-art generation systems, our general-purpose sentence realiser is highly competitive with the added advantages of being simple, fast, robust and accurate.