Minimized models and grammar-informed initialization for supertagging with highly ambiguous lexicons

Authors:
Sujith Ravi;Jason Baldridge;Kevin Knight
Affiliations:
University of Southern California, Marina del Rey, California;The University of Texas at Austin, Austin, Texas;University of Southern California, Marina del Rey, California
Venue:
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Year:
2010

Citing 13
Cited 3

The syntactic process

The syntactic process
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Tagging English text with a probabilistic model

Computational Linguistics
Unsupervised learning of the morphology of a natural language

Computational Linguistics
Unsupervised discovery of morphemes

MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
Part of speech tagging in context

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Partial training for a lexicalized-grammar parser

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank

Computational Linguistics
Wide-coverage efficient statistical parsing with ccg and log-linear models

Computational Linguistics
Weakly supervised supertagging with grammar-informed initialization

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Minimized models for unsupervised part-of-speech tagging

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
A syntactified direct translation model with linear-time decoding

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
The minimum description length principle in coding and modeling

IEEE Transactions on Information Theory

Unsupervised parse selection for HPSG

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Posterior Sparsity in Unsupervised Dependency Parsing

The Journal of Machine Learning Research
Learning structural dependencies of words in the Zipfian tail

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

We combine two complementary ideas for learning supertaggers from highly ambiguous lexicons: grammar-informed tag transitions and models minimized via integer programming. Each strategy on its own greatly improves performance over basic expectation-maximization training with a bitag Hidden Markov Model, which we show on the CCGbank and CCG-TUT corpora. The strategies provide further error reductions when combined. We describe a new two-stage integer programming strategy that efficiently deals with the high degree of ambiguity on these datasets while obtaining the full effect of model minimization.