Trace prediction and recovery with unlexicalized PCFGs and slash features

Authors:
Helmut Schmid
Affiliations:
University of Stuttgart
Venue:
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Year:
2006

Citing 11
Cited 7

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
PCFG models of linguistic tree representations

Computational Linguistics
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Three generative, lexicalised models for statistical parsing

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A simple pattern-matching algorithm for recovering empty nodes and their antecedents

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Accurate unlexicalized parsing

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Deep syntactic processing by combining shallow methods

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Antecedent recovery: experiments with a trace tagger

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Deep dependencies from context-free statistical parsers: correcting the surface dependency approximation

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Using linguistic principles to recover empty categories

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Efficient parsing of highly ambiguous context-free grammars with bit vectors

COLING '04 Proceedings of the 20th international conference on Computational Linguistics

Language independent probabilistic context-free parsing bolstered by machine learning

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Re-estimation of lexical parameters for treebank PCFGs

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Creating and exploiting a resource of parallel parses

LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
A statistical tree annotator and its applications

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Language-independent parsing with empty elements

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Learning structural dependencies of words in the Zipfian tail

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Knowledge sources for constituent parsing of german, a morphologically rich and less-configurational language

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a parser which generates parse trees with empty elements in which traces and fillers are co-indexed. The parser is an unlexicalized PCFG parser which is guaranteed to return the most probable parse. The grammar is extracted from a version of the PENN treebank which was automatically annotated with features in the style of Klein and Manning (2003). The annotation includes GPSG-style slash features which link traces and fillers, and other features which improve the general parsing accuracy. In an evaluation on the PENN treebank (Marcus et al., 1993), the parser outperformed other unlexicalized PCFG parsers in terms of labeled bracketing f-score. Its results for the empty category prediction task and the trace-filler co-indexation task exceed all previously reported results with 84.1% and 77.4% f-score, respectively.