A decision tree approach to sentence chunking

  • Authors:
  • Samuel W. K. Chan

  • Affiliations:
  • Dept. of Decision Sciences, The Chinese University of Hong Kong, Hong Kong SAR

  • Venue:
  • AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
  • Year:
  • 2007

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper proposes an algorithm which can chunk a given sentence into meaningful and coherent segments. The algorithm is based on the assumption that segment boundaries can be identified by analyzing various information-theoretic measures of the part-of-speech (POS) n-grams within the sentence. The assumption is supported by a series of experiments using the POS-tagged corpus and Treebank from Academia Sinica. Experimental results show that the combination of different classifiers based on the measures improves the system coverage while maintaining its precision in our evaluation of 10, 000 sentences.