Using linguistic principles to recover empty categories

Authors:
Richard Campbell
Affiliations:
Microsoft Research, Redmond, WA
Venue:
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Year:
2004

Citing 8
Cited 13

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Assigning function tags to parsed text

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Three generative, lexicalised models for statistical parsing

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A machine-learning approach to the identification of WH gaps

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2
A simple pattern-matching algorithm for recovering empty nodes and their antecedents

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Deep syntactic processing by combining shallow methods

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Antecedent recovery: experiments with a trace tagger

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing

Pseudo-projective dependency parsing

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Trace prediction and recovery with unlexicalized PCFGs and slash features

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A shortest path dependency kernel for relation extraction

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Fully parsing the Penn Treebank

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank

Computational Linguistics
Chasing the ghost: recovering empty categories in the Chinese treebank

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
A statistical tree annotator and its applications

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Language-independent parsing with empty elements

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Empty categories in Hindi dependency treebank: analysis and recovery

LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
PLCFRS parsing of English discontinuous constituents

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Knowledge sources for constituent parsing of german, a morphologically rich and less-configurational language

Computational Linguistics
A clause-level hybrid approach to Chinese empty element recovery

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Beyond sentence-level semantic role labeling: linking argument structures in discourse

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes an algorithm for detecting empty nodes in the Penn Treebank (Marcus et al., 1993), finding their antecedents, and assigning them function tags, without access to lexical information such as valency. Unlike previous approaches to this task, the current method is not corpus-based, but rather makes use of the principles of early Government-Binding theory (Chomsky, 1981), the syntactic theory that underlies the annotation. Using the evaluation metric proposed by Johnson (2002), this approach outperforms previously published approaches on both detection of empty categories and antecedent identification, given either annotated input stripped of empty categories or the output of a parser. Some problems with this evaluation metric are noted and an alternative is proposed along with the results. The paper considers the reasons a principle-based approach to this problem should outperform corpus-based approaches, and speculates on the possibility of a hybrid approach.