Automatic model refinement: with an application to tagging

  • Authors:
  • Yi-Chung Lin;Tung-Hui Chiang;Keh-Yih Su

  • Affiliations:
  • National Tsing Hua University, Hsinchu, Taiwan, Republic of China;National Tsing Hua University, Hsinchu, Taiwan, Republic of China;National Tsing Hua University, Hsinchu, Taiwan, Republic of China

  • Venue:
  • COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
  • Year:
  • 1994

Quantified Score

Hi-index 0.00

Visualization

Abstract

Statistical NLP models usually only consider coarse information and very restricted context to make the estimation of parameters feasible. To reduce the modeling error introduced by a simplified probabilistic model, the Classification and Regression Tree (CART) method was adopted in this paper to select more discriminative features for automatic model refinement. Because the features are adopted dependently during splitting the classification tree in CART, the number of training data in each terminal node is small, which makes the labeling process of terminal nodes not robust. This over-tuning phenomenon cannot be completely removed by cross - validation process (i.e., pruning process). A probabilistic classification model based on the selected discriminative features is thus proposed to use the training data more efficiently. In tagging the Brown Corpus, our probabilistic classification model reduces the error rate of the top 10 error dominant words from 5.71% to 4.35%, which shows 23.82% improvement over the unrefined model.