Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
A maximum-entropy-inspired parser
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Assigning function tags to parsed text
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Three generative, lexicalised models for statistical parsing
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A machine-learning approach to the identification of WH gaps
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2
A simple pattern-matching algorithm for recovering empty nodes and their antecedents
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Deep syntactic processing by combining shallow methods
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Antecedent recovery: experiments with a trace tagger
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Pseudo-projective dependency parsing
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Trace prediction and recovery with unlexicalized PCFGs and slash features
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A shortest path dependency kernel for relation extraction
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Fully parsing the Penn Treebank
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank
Computational Linguistics
Chasing the ghost: recovering empty categories in the Chinese treebank
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
A statistical tree annotator and its applications
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Language-independent parsing with empty elements
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Empty categories in Hindi dependency treebank: analysis and recovery
LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
PLCFRS parsing of English discontinuous constituents
IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
A clause-level hybrid approach to Chinese empty element recovery
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Beyond sentence-level semantic role labeling: linking argument structures in discourse
Language Resources and Evaluation
Hi-index | 0.00 |
This paper describes an algorithm for detecting empty nodes in the Penn Treebank (Marcus et al., 1993), finding their antecedents, and assigning them function tags, without access to lexical information such as valency. Unlike previous approaches to this task, the current method is not corpus-based, but rather makes use of the principles of early Government-Binding theory (Chomsky, 1981), the syntactic theory that underlies the annotation. Using the evaluation metric proposed by Johnson (2002), this approach outperforms previously published approaches on both detection of empty categories and antecedent identification, given either annotated input stripped of empty categories or the output of a parser. Some problems with this evaluation metric are noted and an alternative is proposed along with the results. The paper considers the reasons a principle-based approach to this problem should outperform corpus-based approaches, and speculates on the possibility of a hybrid approach.