Faster parsing by supertagger adaptation

Authors:
Jonathan K. Kummerfeld;Jessika Roesner;Tim Dawborn;James Haggerty;James R. Curran;Stephen Clark
Affiliations:
University of Sydney, NSW, Australia;University of Texas at Austin, Austin, TX;University of Sydney, NSW, Australia;University of Sydney, NSW, Australia;University of Sydney, NSW, Australia;University of Cambridge, Cambridge, UK
Venue:
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Year:
2010

Citing 25
Cited 4

On the MSE robustness of batching estimators

Proceedings of the 33nd conference on Winter simulation
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Ultraconservative online algorithms for multiclass problems

The Journal of Machine Learning Research
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Supertagging: an approach to almost parsing

Computational Linguistics
New models for improving supertag disambiguation

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Bootstrapping statistical parsers from small datasets

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Statistical significance of MUC-6 results

MUC6 '95 Proceedings of the 6th conference on Message understanding
Applying co-training methods to statistical parsing

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Bootstrapping POS taggers using unlabelled data

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Discriminative Reranking for Natural Language Parsing

Computational Linguistics
Reranking and self-training for parser adaptation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
The importance of supertagging for wide-coverage CCG parsing

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Effective self-training for parsing

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Evaluating the accuracy of an unlexicalized statistical parser on the PARC DepBank

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank

Computational Linguistics
Wide-coverage efficient statistical parsing with ccg and log-linear models

Computational Linguistics
On the unification of syntactic annotations under the stanford dependency scheme: a case study on BioInfer and GENIA

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Learning efficient parsing

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Multilingual deep lexical acquisition for HPSGs via supertagging

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Adapting a lexicalized-grammar parser to contrasting domains

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Porting a lexicalized-grammar parser to the biomedical domain

Journal of Biomedical Informatics
HPSG supertagging: a sequence labeling view

IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Some experiments on indicators of parsing complexity for lexicalized grammars

Proceedings of the COLING-2000 Workshop on Efficiency In Large-Scale Parsing Systems

Chart pruning for fast lexicalised-grammar parsing

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
A comparison of loopy belief propagation and dual decomposition for integrated CCG supertagging and parsing

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Efficient CCG parsing: A* versus adaptive supertagging

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Exciting and interesting: issues in the generation of binomials

UCNLG+EVAL '11 Proceedings of the UCNLG+Eval: Language Generation and Evaluation Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a novel self-training method for a parser which uses a lexicalised grammar and supertagger, focusing on increasing the speed of the parser rather than its accuracy. The idea is to train the supertagger on large amounts of parser output, so that the supertagger can learn to supply the supertags that the parser will eventually choose as part of the highest-scoring derivation. Since the supertagger supplies fewer supertags overall, the parsing speed is increased. We demonstrate the effectiveness of the method using a CCG supertagger and parser, obtaining significant speed increases on newspaper text with no loss in accuracy. We also show that the method can be used to adapt the CCG parser to new domains, obtaining accuracy and speed improvements for Wikipedia and biomedical text.