A hybrid Japanese parser with hand-crafted grammar and statistics

  • Authors:
  • Hiroshi Kanayama;Kentaro Torisawa;Yutaka Mitsuishi;Jun'ichi Tsujii

  • Affiliations:
  • Tokyo Research Laboratory, IBM Japan Ltd., Kanagawa, Japan;University of Tokyo, Tokyo, Japan;University of Tokyo, Tokyo, Japan;University of Tokyo, Tokyo, Japan

  • Venue:
  • COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a hybrid parsing method for Japanese which uses both a hand-crafted grammar and a statistical technique. The key feature of our system is that in order to estimate likelihood for a parse tree, the system uses information taken from alternative partial parse trees generated by the grammar. This utilization of alternative trees enables us to construct a new statistical model called Triplet/Quadruplet Model. We show that this model can capture a certain tendency in Japanese syntactic structures and this point contributes to improvement of parsing accuracy on a shallow level. We report that, with an underspecified HPSG-based grammar and a maximum entropy estimation, our parser achieved high accuracy: 88.6% accuracy in dependency analysis of the EDR annotated corpus, and that it outperformed other purely statistical parsing methods on the same corpus. This result suggests that proper treatment of hand-crafted grammars can contribute to parsing accuracy on a shallow level.