Tamil dependency parsing: results using rule based and corpus based approaches

Authors:
Loganathan Ramasamy;Zdeněk Žabokrtský
Affiliations:
Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague;Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague
Venue:
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
Year:
2011

Citing 10
Cited 1

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Parsing free word order languages in the Paninian framework

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Head-Driven Statistical Models for Natural Language Parsing

Computational Linguistics
Online large-margin training of dependency parsers

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
TectoMT: highly modular MT system with tectogrammatics used as transfer layer

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Insights into non-projectivity in Hindi

ACLstudent '09 Proceedings of the ACL-IJCNLP 2009 Student Research Workshop
Simple parser for Indian languages in a dependency framework

ACL-IJCNLP '09 Proceedings of the Third Linguistic Annotation Workshop
Issues in analyzing telugu sentences towards building a telugu treebank

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing

Getting more from morphology in multilingual dependency parsing

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Very few attempts have been reported in the literature on dependency parsing for Tamil. In this paper, we report results obtained for Tamil dependency parsing with rule-based and corpus-based approaches. We designed annotation scheme partially based on Prague Dependency Treebank (PDT) and manually annotated Tamil data (about 3000 words) with dependency relations. For corpus-based approach, we used two well known parsers MaltParser and MSTParser, and for the rule-based approach, we implemented series of linguistic rules (for resolving coordination, complementation, predicate identification and so on) to build dependency structure for Tamil sentences. Our initial results show that, both rule-based and corpus-based approaches achieved the accuracy of more than 74% for the unlabeled task and more than 65% for the labeled tasks. Rule-based parsing accuracy dropped considerably when the input was tagged automatically.