Empty categories in Hindi dependency treebank: analysis and recovery

Authors:
Chaitanya Gsk;Samar Husain;Prashanth Mannem
Affiliations:
Intl Institute of Info. Technology, Hyderabad, India;Intl Institute of Info. Technology, Hyderabad, India;Intl Institute of Info. Technology, Hyderabad, India
Venue:
LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
Year:
2011

Citing 10
Cited 0

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Three generative, lexicalised models for statistical parsing

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A machine-learning approach to the identification of WH gaps

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2
A simple pattern-matching algorithm for recovering empty nodes and their antecedents

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Deep syntactic processing by combining shallow methods

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Antecedent recovery: experiments with a trace tagger

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Using linguistic principles to recover empty categories

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Non-projective dependency parsing using spanning tree algorithms

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Two stage constraint based hybrid approach to free word order language dependency parsing

IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Chasing the ghost: recovering empty categories in the Chinese treebank

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters

Quantified Score

Hi-index	0.03

Visualization

Abstract

In this paper, we first analyze and classify the empty categories in a Hindi dependency tree-bank and then identify various discovery procedures to automatically detect the existence of these categories in a sentence. For this we make use of lexical knowledge along with the parsed output from a constraint based parser. Through this work we show that it is possible to successfully discover certain types of empty categories while some other types are more difficult to identify. This work leads to the state-of-the-art system for automatic insertion of empty categories in the Hindi sentence.