Tagging English text with a probabilistic model
Computational Linguistics
TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Information Sciences: an International Journal
Model-driven restricted-domain adaptation of question answering systems for business intelligence
Proceedings of the 2nd International Workshop on Business intelligencE and the WEB
Developing a competitive HMM arabic POS tagger using small training corpora
ACIIDS'11 Proceedings of the Third international conference on Intelligent information and database systems - Volume Part I
Hi-index | 0.00 |
One of the important processing steps for many natural language systems (information extraction, question answering, etc.) is Part-of-speech (PoS) tagging. This issue has been tackled with a number of different approaches in order to resolve this step. In this paper we study the functioning of a Hidden Markov Models (HMM) Spanish PoS tagger using a minimum amount of training corpora. Our PoS tagger is based on HMM where the states are tag pairs that emit words. It is based on transitional and lexical probabilities. This technique has been suggested by Rabiner [11] –and our implementation is influenced by Brants [2]–. We have investigated the best configuration of HMM using a small amount of training data which has about 50,000 words and the maximum precision obtained for an unknown Spanish text was 95.36%.