TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Tagging Urdu text with parts of speech: a tagger comparison
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
An Information-Extraction System for Urdu---A Resource-Poor Language
ACM Transactions on Asian Language Information Processing (TALIP)
Sentiment analysis of urdu language: handling phrase-level negation
MICAI'11 Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
Analyzing Urdu social media for sentiments using transfer learning with controlled translations
LSM '12 Proceedings of the Second Workshop on Language in Social Media
Associating targets with SentiUnits: a step forward in sentiment analysis of Urdu text
Artificial Intelligence Review
Hi-index | 0.00 |
In this paper, two corpora of Urdu (with 110K and 120K words) tagged with different POS tagsets are used to train TnT and Tree taggers. Error analysis of both taggers is done to identify frequent confusions in tagging. Based on the analysis of tagging, and syntactic structure of Urdu, a more refined tagset is derived. The existing tagged corpora are tagged with the new tagset to develop a single corpus of 230K words and the TnT tagger is retrained. The results show improvement in tagging accuracy for individual corpora to 94.2% and also for the merged corpus to 91%. Implications of these results are discussed.