Corpus-Oriented grammar development for acquiring a head-driven phrase structure grammar from the penn treebank

  • Authors:
  • Yusuke Miyao;Takashi Ninomiya;Jun’ichi Tsujii

  • Affiliations:
  • University of Tokyo, Tokyo;University of Tokyo, Tokyo;University of Tokyo, Tokyo

  • Venue:
  • IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a method of semi-automatically acquiring an English HPSG grammar from the Penn Treebank. First, heuristic rules are employed to annotate the treebank with partially-specified derivation trees of HPSG. Lexical entries are automatically extracted from the annotated corpus by inversely applying HPSG schemata to partially-specified derivation trees. Predefined HPSG schemata assure the acquired lexicon to conform to the theoretical formulation of HPSG. Experimental results revealed that this approach enabled us to develop an HPSG grammar with significant robustness at small cost.