A simple pattern-matching algorithm for recovering empty nodes and their antecedents

  • Authors:
  • Mark Johnson

  • Affiliations:
  • Brown University

  • Venue:
  • ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a simple pattern-matching algorithm for recovering empty nodes and identifying their co-indexed antecedents in phrase structure trees that do not contain this information. The patterns are minimal connected tree fragments containing an empty node and all other nodes co-indexed with it. This paper also proposes an evaluation procedure for empty node recovery procedures which is independent of most of the details of phrase structure, which makes it possible to compare the performance of empty node recovery on parser output with the empty node annotations in a gold-standard corpus. Evaluating the algorithm on the output of Charniak's parser (Charniak, 2000) and the Penn treebank (Marcus et al., 1993) shows that the pattern-matching algorithm does surprisingly well on the most frequently occuring types of empty nodes given its simplicity.