Bayesian symbol-refined tree substitution grammars for syntactic parsing

  • Authors:
  • Hiroyuki Shindo;Yusuke Miyao;Akinori Fujino;Masaaki Nagata

  • Affiliations:
  • NTT Corporation, Soraku-gun, Kyoto, Japan;National Institute of Informatics, Chiyoda-ku, Tokyo, Japan;NTT Corporation, Soraku-gun, Kyoto, Japan;NTT Corporation, Soraku-gun, Kyoto, Japan

  • Venue:
  • ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose Symbol-Refined Tree Substitution Grammars (SR-TSGs) for syntactic parsing. An SR-TSG is an extension of the conventional TSG model where each nonterminal symbol can be refined (subcategorized) to fit the training data. We aim to provide a unified model where TSG rules and symbol refinement are learned from training data in a fully automatic and consistent fashion. We present a novel probabilistic SR-TSG model based on the hierarchical Pitman-Yor Process to encode backoff smoothing from a fine-grained SR-TSG to simpler CFG rules, and develop an efficient training method based on Markov Chain Monte Carlo (MCMC) sampling. Our SR-TSG parser achieves an F1 score of 92.4% in the Wall Street Journal (WSJ) English Penn Treebank parsing task, which is a 7.7 point improvement over a conventional Bayesian TSG parser, and better than state-of-the-art discriminative reranking parsers.