Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
A multi-representational and multi-layered treebank for Hindi/Urdu
ACL-IJCNLP '09 Proceedings of the Third Linguistic Annotation Workshop
Hi-index | 0.00 |
We present two approaches (rule-based and statistical) for automatically annotating intra-chunk dependencies in Hindi. The intra-chunk dependencies are added to the dependency trees for Hindi which are already annotated with inter-chunk dependencies. Thus, the intra-chunk annotator finally provides a fully parsed dependency tree for a Hindi sentence. In this paper, we first describe the guidelines for marking intra-chunk dependency relations. Although the guidelines are for Hindi, they can easily be extended to other Indian languages. These guidelines are used for framing the rules in the rule-based approach. For the statistical approach, we use MaltParser, a data driven parser. A part of the ICON 2010 tools contest data for Hindi is used for training and testing the MaltParser. The same set is used for testing the rule-based approach.