A simple pattern-matching algorithm for recovering empty nodes and their antecedents

Authors:
Mark Johnson
Affiliations:
Brown University
Venue:
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Year:
2002

Citing 6
Cited 34

Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Discriminative Reranking for Natural Language Parsing

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
PCFG models of linguistic tree representations

Computational Linguistics
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Three generative, lexicalised models for statistical parsing

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics

Deep syntactic processing by combining shallow methods

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Finding non-local dependencies: beyond pattern matching

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
Antecedent recovery: experiments with a trace tagger

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Identifying semantic roles using Combinatory Categorial Grammar

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
The Proposition Bank: An Annotated Corpus of Semantic Roles

Computational Linguistics
Enriching the output of a parser using memory-based learning

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Long-distance dependency resolution in automatically acquired wide-coverage PCFG-based LFG approximations

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Deep dependencies from context-free statistical parsers: correcting the surface dependency approximation

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Using linguistic principles to recover empty categories

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Robust VPE detection using automatically parsed text

ACLstudent '04 Proceedings of the ACL 2004 workshop on Student research
Pseudo-projective dependency parsing

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Trace prediction and recovery with unlexicalized PCFGs and slash features

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
QuestionBank: creating a corpus of parse-annotated questions

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Verb phrase ellipsis detection using automatically parsed text

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Fully parsing the Penn Treebank

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank

Computational Linguistics
Improving phrase-based statistical machine translation with morphosyntactic transformation

Machine Translation
Wide-coverage deep statistical parsing using automatic dependency structure annotation

Computational Linguistics
Symbolic preference using simple scoring

IWPT '07 Proceedings of the 10th International Conference on Parsing Technologies
A robust and hybrid deep-linguistic theory applied to large-scale parsing

ROMAND '04 Proceedings of the 3rd Workshop on RObust Methods in Analysis of Natural Language Data
Corrective modeling for non-projective dependency parsing

Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology
Unbounded dependency recovery for parser evaluation

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Effects of empty categories on machine translation

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Data-driven parsing with probabilistic linear context-free rewriting systems

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Chasing the ghost: recovering empty categories in the Chinese treebank

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Language-independent parsing with empty elements

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Empty categories in Hindi dependency treebank: analysis and recovery

LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
Assigning function tags with a simple model

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
A paragraph boundary detection system

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
PLCFRS parsing of English discontinuous constituents

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Zero pronoun resolution can improve the quality of J-E translation

SSST-6 '12 Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation
Knowledge sources for constituent parsing of german, a morphologically rich and less-configurational language

Computational Linguistics
Data-driven parsing using probabilistic linear context-free rewriting systems

Computational Linguistics
A clause-level hybrid approach to Chinese empty element recovery

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a simple pattern-matching algorithm for recovering empty nodes and identifying their co-indexed antecedents in phrase structure trees that do not contain this information. The patterns are minimal connected tree fragments containing an empty node and all other nodes co-indexed with it. This paper also proposes an evaluation procedure for empty node recovery procedures which is independent of most of the details of phrase structure, which makes it possible to compare the performance of empty node recovery on parser output with the empty node annotations in a gold-standard corpus. Evaluating the algorithm on the output of Charniak's parser (Charniak, 2000) and the Penn treebank (Marcus et al., 1993) shows that the pattern-matching algorithm does surprisingly well on the most frequently occuring types of empty nodes given its simplicity.