Elements of information theory
Elements of information theory
Learning probabilistic automata with variable memory length
COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Machine Learning
Method combination for document filtering
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A decision-theoretic generalization of on-line learning and an application to boosting
EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
A stochastic parts program and noun phrase parser for unrestricted text
ANLC '88 Proceedings of the second conference on Applied natural language processing
A simple rule-based part of speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
Part-of-speech tagging using a Variable Memory Markov model
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
A stochastic Japanese morphological analyzer using a forward-DP backward-A* N-best search algorithm
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
A universal finite memory source
IEEE Transactions on Information Theory
The context-tree weighting method: basic properties
IEEE Transactions on Information Theory
Using Decision Trees to Construct a Practical Parser
Machine Learning - Special issue on natural language learning
Extended models and tools for high-performance part-of-speech tagger
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Automatic refinement of a POS tagger using a reliable parser and plain text corpora
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Hi-index | 0.00 |
This paper proposes a mistake-driven mixture method for learning a tag model. The method iteratively performs two procedures: 1. constructing a tag model based on the current data distribution and 2. updating the distribution by focusing on data that are not well predicted by the constructed model. The final tag model is constructed by mixing all the models according to their performance. To well reflect the data distribution, we represent each tag model as a hierarchical tag (i.e., NTT context tree. By using the hierarchical tag context tree, the constituents of sequential tag models gradually change from broad coverage tags (e.g., noun) to specific exceptional words that cannot be captured by general tags. In other words, the method incorporates not only frequent connetions but also infrequent ones that are often considered to be collocational. We evaluate several tag models by implementing Japanese part-of-speech taggers that share all other conditions (i.e., dictionary and word model) other than their tag models. The experimental results show the proposed method significantly outperforms both hand-crafted and conventional statistical methods.