Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Parsing free word order languages in the Paninian framework
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Statistical phrase-based translation
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Head-Driven Statistical Models for Natural Language Parsing
Computational Linguistics
Online large-margin training of dependency parsers
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
TectoMT: highly modular MT system with tectogrammatics used as transfer layer
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Insights into non-projectivity in Hindi
ACLstudent '09 Proceedings of the ACL-IJCNLP 2009 Student Research Workshop
Simple parser for Indian languages in a dependency framework
ACL-IJCNLP '09 Proceedings of the Third Linguistic Annotation Workshop
Issues in analyzing telugu sentences towards building a telugu treebank
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Getting more from morphology in multilingual dependency parsing
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Hi-index | 0.00 |
Very few attempts have been reported in the literature on dependency parsing for Tamil. In this paper, we report results obtained for Tamil dependency parsing with rule-based and corpus-based approaches. We designed annotation scheme partially based on Prague Dependency Treebank (PDT) and manually annotated Tamil data (about 3000 words) with dependency relations. For corpus-based approach, we used two well known parsers MaltParser and MSTParser, and for the rule-based approach, we implemented series of linguistic rules (for resolving coordination, complementation, predicate identification and so on) to build dependency structure for Tamil sentences. Our initial results show that, both rule-based and corpus-based approaches achieved the accuracy of more than 74% for the unlabeled task and more than 65% for the labeled tasks. Rule-based parsing accuracy dropped considerably when the input was tagged automatically.