Chinese comma disambiguation for discourse analysis

  • Authors:
  • Yaqin Yang;Nianwen Xue

  • Affiliations:
  • Brandeis University, Waltham, MA;Brandeis University, Waltham, MA

  • Venue:
  • ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Chinese comma signals the boundary of discourse units and also anchors discourse relations between adjacent text spans. In this work, we propose a discourse structure-oriented classification of the comma that can be automatically extracted from the Chinese Treebank based on syntactic patterns. We then experimented with two supervised learning methods that automatically disambiguate the Chinese comma based on this classification. The first method integrates comma classification into parsing, and the second method adopts a "post-processing" approach that extracts features from automatic parses to train a classifier. The experimental results show that the second approach compares favorably against the first approach.