Artificial Intelligence - Special volume on empirical methods
Statistical methods for speech recognition
Statistical methods for speech recognition
Foundations of statistical natural language processing
Foundations of statistical natural language processing
WIA '99 Revised Papers from the 4th International Workshop on Automata Implementation
Head-driven statistical models for natural language parsing
Head-driven statistical models for natural language parsing
Improving accuracy in word class tagging through the combination of machine learning systems
Computational Linguistics
TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Hi-index | 0.00 |
The Hidden Markov Model (HMM) for part-of-speech (POS) tagging is typically based on tag trigrams. As such it models local context but not global context, leaving long-distance syntactic relations unrepresented. Using n-gram models for n 3 in order to incorporate global context is problematic as the tag sequences corresponding to higher order models will become increasingly rare in training data, leading to incorrect estimations of their probabilities.The trigram HMM can be extended with global contextual information, without making the model infeasible, by incorporating the context separately from the POS tags. The new information incorporated in the model is acquired through the use of a wide-coverage parser. The model is trained and tested on Dutch text from two different sources, showing an increase in tagging accuracy compared to tagging using the standard model.