Decision-tree based error correction for statistical phrase break prediction in Korean

  • Authors:
  • Byeongchang Kim;Geunbae Lee

  • Affiliations:
  • Pohang University of Science & Technology, Pohang, South Korea;Pohang University of Science & Technology, Pohang, South Korea

  • Venue:
  • COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present a new phrase break prediction architecture that integrates probabilistic approach with decision-tree based error correction. The probabilistic method alone usually suffers from performance degradation due to inherent data sparseness problems and it only covers a limited range of contextual information. Moreover, the module can not utilize the selective morpheme tag and relative distance to the other phrase breaks. The decision-tree based error correction was tightly integrated to overcome these limitations.The initially phrase break tagged morpheme sequence is corrected with the error correcting decision tree which was induced by C4.5 from the correctly tagged corpus with the output of the probabilistic predictor. The decision tree-based post error correction provided improved results even with the phrase break predictor that has poor initial performance. Moreover, the system can be flexibly tuned to new corpus without massive retraining.