Tree topological features for unlexicalized parsing

  • Authors:
  • Samuel W. K. Chan;Lawrence Y. L. Cheung;Mickey W. C. Chong

  • Affiliations:
  • Chinese University of Hong Kong;Chinese University of Hong Kong;Chinese University of Hong Kong

  • Venue:
  • COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

As unlexicalized parsing lacks word token information, it is important to investigate novel parsing features to improve the accuracy. This paper studies a set of tree topological (TT) features. They quantitatively describe the tree shape dominated by each non-terminal node. The features are useful in capturing linguistic notions such as grammatical weight and syntactic branching, which are factors important to syntactic processing but overlooked in the parsing literature. By using an ensemble classifier-based model, TT features can significantly improve the parsing accuracy of our unlexicalized parser. Further, the ease of estimating TT feature values makes them easy to be incorporated into virtually any mainstream parsers.