Beyond N in N-gram tagging

Authors:
Robbert Prins
Affiliations:
University of Groningen, The Netherlands
Venue:
ACLstudent '04 Proceedings of the ACL 2004 workshop on Student research
Year:
2004

Citing 7
Cited 0

Taggers for parsers

Artificial Intelligence - Special volume on empirical methods
Statistical methods for speech recognition

Statistical methods for speech recognition
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Treatment of Unknown Words

WIA '99 Revised Papers from the 4th International Workshop on Automata Implementation
Head-driven statistical models for natural language parsing

Head-driven statistical models for natural language parsing
Improving accuracy in word class tagging through the combination of machine learning systems

Computational Linguistics
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Hidden Markov Model (HMM) for part-of-speech (POS) tagging is typically based on tag trigrams. As such it models local context but not global context, leaving long-distance syntactic relations unrepresented. Using n-gram models for n 3 in order to incorporate global context is problematic as the tag sequences corresponding to higher order models will become increasingly rare in training data, leading to incorrect estimations of their probabilities.The trigram HMM can be extended with global contextual information, without making the model infeasible, by incorporating the context separately from the POS tags. The new information incorporated in the model is acquired through the use of a wide-coverage parser. The model is trained and tested on Dutch text from two different sources, showing an increase in tagging accuracy compared to tagging using the standard model.