Estimation of stochastic attribute-value grammars using an informative sample

Authors:
Miles Osborne
Affiliations:
Rijksuniversiteit Groningen, The Netherlands
Venue:
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Year:
2000

Citing 6
Cited 20

Inducing Features of Random Fields

IEEE Transactions on Pattern Analysis and Machine Intelligence
Stochastic Complexity in Statistical Inquiry Theory

Stochastic Complexity in Statistical Inquiry Theory
Stochastic attribute-value grammars

Computational Linguistics
Automatic extraction of subcategorization from corpora

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Estimators for stochastic "Unification-Based" grammars

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Efficient sampling and feature selection in whole sentence maximum entropy language models

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01

DCG induction using MDL and parsed corpora

Learning language in logic
A novel disambiguation method for unification-based grammars using probabilistic context-free approximations

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Improved iterative scaling can yield multiple globally optimal models with radically differing performance levels

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Overfitting avoidance for stochastic modeling of attribute-value grammars

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Incorporating linguistics constraints into inductive logic programming

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Learning computational grammars

ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
Active learning for HPSG parse selection

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Log-linear models for wide-coverage CCG parsing

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Probabilistic disambiguation models for wide-coverage HPSG parsing

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
The importance of supertagging for wide-coverage CCG parsing

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Feature forest models for probabilistic hpsg parsing

Computational Linguistics
Active learning and logarithmic opinion pools for hpsg parse selection

Natural Language Engineering
Zero-Anaphora Resolution in Chinese Using Maximum Entropy

IEICE - Transactions on Information and Systems
Exploring an auxiliary distribution based approach to domain adaptation of a syntactic disambiguation model

CrossParser '08 Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation
Partial parse selection for robust deep processing

DeepLP '07 Proceedings of the Workshop on Deep Linguistic Processing
A comparison of structural correspondence learning and self-training for discriminative parse selection

SemiSupLearn '09 Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
Feature selection for fluency ranking

INLG '10 Proceedings of the 6th International Natural Language Generation Conference
Reversible stochastic attribute-value grammars

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Discriminative features in reversible stochastic attribute-value grammars

UCNLG+EVAL '11 Proceedings of the UCNLG+Eval: Language Generation and Evaluation Workshop
Syntactic language modeling with formal grammars

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

We argue that some of the computational complexity associated with estimation of stochastic attribute value grammars can be reduced by training upon an informative subset of the full training set. Results using the parsed Wall Street Journal corpus show that in some circumstances, it is possible to obtain better estimation results using an informative sample than when training upon all the available material. Further experimentation demonstrates that with unlexicalised models, a Gaussian prior can reduce overfitting. However, when models are lexicalised and contain overlapping features, overfitting does not seem to be a problem, and a Gaussian prior makes minimal difference to performance. Our approach is applicable for situations when there are an infeasibly large number of parses in the training set, or else for when recovery of these parses from a packed representation is itself computationally expensive.