The syntactic process
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Tagging English text with a probabilistic model
Computational Linguistics
Unsupervised learning of the morphology of a natural language
Computational Linguistics
Unsupervised discovery of morphemes
MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
Part of speech tagging in context
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Partial training for a lexicalized-grammar parser
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank
Computational Linguistics
Wide-coverage efficient statistical parsing with ccg and log-linear models
Computational Linguistics
Weakly supervised supertagging with grammar-informed initialization
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Minimized models for unsupervised part-of-speech tagging
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
A syntactified direct translation model with linear-time decoding
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
The minimum description length principle in coding and modeling
IEEE Transactions on Information Theory
Unsupervised parse selection for HPSG
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Posterior Sparsity in Unsupervised Dependency Parsing
The Journal of Machine Learning Research
Learning structural dependencies of words in the Zipfian tail
IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Hi-index | 0.00 |
We combine two complementary ideas for learning supertaggers from highly ambiguous lexicons: grammar-informed tag transitions and models minimized via integer programming. Each strategy on its own greatly improves performance over basic expectation-maximization training with a bitag Hidden Markov Model, which we show on the CCGbank and CCG-TUT corpora. The strategies provide further error reductions when combined. We describe a new two-stage integer programming strategy that efficiently deals with the high degree of ambiguity on these datasets while obtaining the full effect of model minimization.