Automatic model refinement: with an application to tagging

Authors:
Yi-Chung Lin;Tung-Hui Chiang;Keh-Yih Su
Affiliations:
National Tsing Hua University, Hsinchu, Taiwan, Republic of China;National Tsing Hua University, Hsinchu, Taiwan, Republic of China;National Tsing Hua University, Hsinchu, Taiwan, Republic of China
Venue:
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Year:
1994

Citing 2
Cited 2

Syntactic ambiguity resolution using a discrimination and robustness oriented adaptive learning algorithm

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 1
Some applications of tree-based modelling to speech and language

HLT '89 Proceedings of the workshop on Speech and Natural Language

Robust learning, smoothing, and parameter tying on syntactic ambiguity resolution

Computational Linguistics
Task adaptation in stochastic language model for Chinese homophone disambiguation

ACM Transactions on Asian Language Information Processing (TALIP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Statistical NLP models usually only consider coarse information and very restricted context to make the estimation of parameters feasible. To reduce the modeling error introduced by a simplified probabilistic model, the Classification and Regression Tree (CART) method was adopted in this paper to select more discriminative features for automatic model refinement. Because the features are adopted dependently during splitting the classification tree in CART, the number of training data in each terminal node is small, which makes the labeling process of terminal nodes not robust. This over-tuning phenomenon cannot be completely removed by cross - validation process (i.e., pruning process). A probabilistic classification model based on the selected discriminative features is thus proposed to use the training data more efficiently. In tagging the Brown Corpus, our probabilistic classification model reduces the error rate of the top 10 error dominant words from 5.71% to 4.35%, which shows 23.82% improvement over the unrefined model.