Partial training for a lexicalized-grammar parser

Authors:
Stephen Clark;James R. Curran
Affiliations:
Oxford University, Oxford, UK;University of Sydney, NSW, Australia
Venue:
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Year:
2006

Citing 16
Cited 5

The syntactic process

The syntactic process
Inside-outside reestimation from partially bracketed corpora

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
Supervised grammar induction using training data with limited constituent information

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Bootstrapping statistical parsers from small datasets

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
High precision extraction of grammatical relations

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Parsing the wall street journal using a Lexical-Functional Grammar and discriminative estimation techniques

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Generative models for statistical parsing with Combinatory Categorial Grammar

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical parsing with an automatically-extracted tree adjoining grammar

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
A comparison of algorithms for maximum entropy parameter estimation

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Log-linear models for wide-coverage CCG parsing

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Parsing the WSJ using CCG and log-linear models

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Long-distance dependency resolution in automatically acquired wide-coverage PCFG-based LFG approximations

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
The importance of supertagging for wide-coverage CCG parsing

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Maximum entropy estimation for feature forests

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Corpus-Oriented grammar development for acquiring a head-driven phrase structure grammar from the penn treebank

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Parsing biomedical literature

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Data-driven dependency parsing of new languages using incomplete and noisy training data

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Training conditional random fields using incomplete annotations

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Evaluating impact of re-training a lexical disambiguation model on domain adaptation of an HPSG parser

IWPT '07 Proceedings of the 10th International Conference on Parsing Technologies
Porting a lexicalized-grammar parser to the biomedical domain

Journal of Biomedical Informatics
Minimized models and grammar-informed initialization for supertagging with highly ambiguous lexicons

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a solution to the annotation bottleneck for statistical parsing, by exploiting the lexicalized nature of Combinatory Categorial Grammar (CCG). The parsing model uses predicate-argument dependencies for training, which are derived from sequences of CCG lexical categories rather than full derivations. A simple method is used for extracting dependencies from lexical category sequences, resulting in high precision, yet incomplete and noisy data. The dependency parsing model of Clark and Curran (2004b) is extended to exploit this partial training data. Remarkably, the accuracy of the parser trained on data derived from category sequences alone is only 1.3% worse in terms of F-score than the parser trained on complete dependency structures.