An evaluation of lexicalization in parsing

  • Authors:
  • Aravind K. Joshi;Yves Schabes

  • Affiliations:
  • University of Pennsylvania, Philadelphia, PA;University of Pennsylvania, Philadelphia, PA

  • Venue:
  • HLT '89 Proceedings of the workshop on Speech and Natural Language
  • Year:
  • 1989

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we evaluate a two-pass parsing strategy proposed for the so-called 'lexicalized' grammar. In 'lexicalized' grammars (Schabes, Abeillé and Joshi, 1988), each elementary structure is systematically associated with a lexical item called anchor. These structures specify extended domains of locality (as compared to CFGs) over which constraints can be stated. The 'grammar' consists of a lexicon where each lexical item is associated with a finite number of structures for which that item is the anchor. There are no separate grammar rules. There are, of course, 'rules' which tell us how these structures are combined.A general two-pass parsing strategy for 'lexicalized' grammars follows naturally. In the first stage, the parser selects a set of elementary structures associated with the lexical items in the input sentence, and in the second stage the sentence is parsed with respect to this set. We evaluate this strategy with respect to two characteristics. First, the amount of filtering on the entire grammar is evaluated: once the first pass is performed, the parser uses only a subset of the grammar. Second, we evaluate the use of non-local information: the structures selected during the first pass encode the morphological value (and therefore the position in the string) of their anchor; this enables the parser to use non-local information to guide its search.We take Lexicalized Tree Adjoining Grammars as an instance of lexicalized grammar. We illustrate the organization of the grammar. Then we show how a general Earley-type TAG parser (Schabes and Joshi, 1988) can take advantage of lexicalization. Empirical data show that the filtering of the grammar and the non-local information provided by the two-pass strategy improve the performance of the parser.