Weakly supervised supertagging with grammar-informed initialization

Authors:
Jason Baldridge
Affiliations:
The University of Texas at Austin
Venue:
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Year:
2008

Citing 10
Cited 1

The syntactic process

The syntactic process
Supertagging: an approach to almost parsing

Computational Linguistics
Efficient normal-form parsing for combinatory categorial grammar

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Unsupervised learning of field segmentation models for information extraction

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Part of speech tagging in context

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Prototype-driven learning for sequence models

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank

Computational Linguistics
Wide-coverage efficient statistical parsing with ccg and log-linear models

Computational Linguistics
Multilingual deep lexical acquisition for HPSGs via supertagging

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
CCG supertags in factored statistical machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation

Minimized models and grammar-informed initialization for supertagging with highly ambiguous lexicons

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Much previous work has investigated weak supervision with HMMs and tag dictionaries for part-of-speech tagging, but there have been no similar investigations for the harder problem of supertagging. Here, I show that weak supervision for supertagging does work, but that it is subject to severe performance degradation when the tag dictionary is highly ambiguous. I show that lexical category complexity and information about how supertags may combine syntactically can be used to initialize the transition distributions of a first-order Hidden Markov Model for weakly supervised learning. This initialization proves more effective than starting with uniform transitions, especially when the tag dictionary is highly ambiguous.