MAP adaptation of stochastic grammars

Authors:
Michiel Bacchiani;Michael Riley;Brian Roark;Richard Sproat
Affiliations:
IBM TJ Watson Research Center, Rm. 24-124, 1101 Kitchawan Rd, Rt134, Yorktown Height, NY 10598, USA;Google Inc., 1440 Broadway, New York, NY 10018, USA;Center for Spoken Language Understanding, Department of CS&EE, OGI School of Science & Engineering at Oregon Health & Science University, 20000 NW Walker Road, Beaverton, OR 97006, USA;Departments of Linguistics and ECE, University of Illinois at Urbana-Champaign, Foreign Languages Building 4103, 707 South Mathews Avenue, MC-168 Urbana, IL 61801, USA
Venue:
Computer Speech and Language
Year:
2006

Citing 18
Cited 25

Learning to Parse Natural Language with Maximum Entropy Models

Machine Learning - Special issue on natural language learning
Automated Natural Spoken Dialog

Computer
Discriminative Reranking for Natural Language Parsing

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Head-driven statistical models for natural language parsing

Head-driven statistical models for natural language parsing
Robust probabilistic predictive syntactic processing: motivations, models, and applications

Robust probabilistic predictive syntactic processing: motivations, models, and applications
Probabilistic top-down parsing and language modeling

Computational Linguistics
Robust garden path parsing

Natural Language Engineering
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Three generative, lexicalised models for statistical parsing

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Inside-outside reestimation from partially bracketed corpora

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
Compact non-left-recursive grammars using the selective left-corner transform and factoring

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Supervised grammar induction using training data with limited constituent information

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Bootstrapping statistical parsers from small datasets

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Supervised and unsupervised PCFG adaptation to novel domains

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Example selection for bootstrapping statistical parsers

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
On minimizing training corpus for parser acquisition

ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
Statistical parsing with a context-free grammar and word statistics

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
The zero-frequency problem: estimating the probabilities of novel events in adaptive text compression

IEEE Transactions on Information Theory

Reranking and self-training for parser adaptation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Effective self-training for parsing

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Discriminative n-gram language modeling

Computer Speech and Language
Towards robust semantic role labeling

Computational Linguistics
Self-training for biomedical parsing

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Syntactic complexity measures for detecting mild cognitive impairment

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
N-gram weighting: reducing training data mismatch in cross-domain language model estimation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Automatic prediction of parser accuracy

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Shrinking exponential language models

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Cascaded model adaptation for dialog act segmentation and tagging

Computer Speech and Language
Semi-supervised training of a statistical parser from unlabeled partially-bracketed data

IWPT '07 Proceedings of the 10th International Conference on Parsing Technologies
Adapting WSJ-trained parsers to the British National Corpus using in-domain self-training

IWPT '07 Proceedings of the 10th International Conference on Parsing Technologies
Porting a lexicalized-grammar parser to the biomedical domain

Journal of Biomedical Informatics
Cross-domain dependency parsing using a deep linguistic grammar

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Correlating natural language parser performance with statistical measures of the text

KI'09 Proceedings of the 32nd annual German conference on Advances in artificial intelligence
Automatic domain adaptation for parsing

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Open-domain semantic role labeling by modeling word spans

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Exploring representation-learning approaches to domain adaptation

DANLP 2010 Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing
An empirical investigation of discounting in cross-domain language models

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
A word clustering approach to domain adaptation: effective parsing of biomedical texts

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Data point selection for self-training

SPMRL '11 Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages
Adapting translation models to translationese improves SMT

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Robust kaomoji detection in Twitter

LSM '12 Proceedings of the Second Workshop on Language in Social Media
The OpenGrm open-source finite-state grammar software libraries

ACL '12 Proceedings of the ACL 2012 System Demonstrations
On the dynamic adaptation of language models based on dialogue information

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates supervised and unsupervised adaptation of stochastic grammars, including n-gram language models and probabilistic context-free grammars (PCFGs), to a new domain. It is shown that the commonly used approaches of count merging and model interpolation are special cases of a more general maximum a posteriori (MAP) framework, which additionally allows for alternate adaptation approaches. This paper investigates the effectiveness of different adaptation strategies, and, in particular, focuses on the need for supervision in the adaptation process. We show that n-gram models as well as PCFGs benefit from either supervised or unsupervised MAP adaptation in various tasks. For n-gram models, we compare the benefit from supervised adaptation with that of unsupervised adaptation on a speech recognition task with an adaptation sample of limited size (about 17h), and show that unsupervised adaptation can obtain 51% of the 7.7% adaptation gain obtained by supervised adaptation. We also investigate the benefit of using multiple word hypotheses (in the form of a word lattice) for unsupervised adaptation on a speech recognition task for which there was a much larger adaptation sample available. The use of word lattices for adaptation required the derivation of a generalization of the well-known Good-Turing estimate. Using this generalization, we derive a method that uses Monte Carlo sampling for building Katz backoff models. The adaptation results show that, for adaptation samples of limited size (several tens of hours), unsupervised adaptation on lattices gives a performance gain over using transcripts. The experimental results also show that with a very large adaptation sample (1050h), the benefit from transcript-based adaptation matches that of lattice-based adaptation. Finally, we show that PCFG domain adaptation using the MAP framework provides similar gains in F-measure accuracy on a parsing task as was seen in ASR accuracy improvements with n-gram adaptation. Experimental results show that unsupervised adaptation provides 37% of the 10.35% gain obtained by supervised adaptation.