A Computational Approach to Grammatical Coding of English Words
Journal of the ACM (JACM)
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A stochastic parts program and noun phrase parser for unrestricted text
ANLC '88 Proceedings of the second conference on Applied natural language processing
Acquiring disambiguation rules from text
ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
Part-of-speech tagging with neural networks
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Analysis and development of Urdu POS tagged corpus
ALR7 Proceedings of the 7th Workshop on Asian Language Resources
An Information-Extraction System for Urdu---A Resource-Poor Language
ACM Transactions on Asian Language Information Processing (TALIP)
Building a hierarchical annotated corpus of urdu: the URDU.KON-TB treebank
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Hi-index | 0.00 |
In this paper, four state-of-art probabilistic taggers i.e. TnT tagger, TreeTagger, RF tagger and SVM tool, are applied to the Urdu language. For the purpose of the experiment, a syntactic tagset is proposed. A training corpus of 100,000 tokens is used to train the models. Using the lexicon extracted from the training corpus, SVM tool shows the best accuracy of 94.15%. After providing a separate lexicon of 70,568 types, SVM tool again shows the best accuracy of 95.66%.