Improved iterative scaling can yield multiple globally optimal models with radically differing performance levels

Authors:
Iain Bancarz;Miles Osborne
Affiliations:
University of Edinburgh, Edinburgh, Scotland;University of Edinburgh, Edinburgh, Scotland
Venue:
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Year:
2002

Citing 6
Cited 0

Inducing Features of Random Fields

IEEE Transactions on Pattern Analysis and Machine Intelligence
Stochastic attribute-value grammars

Computational Linguistics
Automatic extraction of subcategorization from corpora

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Estimation of stochastic attribute-value grammars using an informative sample

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Estimators for stochastic "Unification-Based" grammars

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Overfitting avoidance for stochastic modeling of attribute-value grammars

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7

Quantified Score

Hi-index	0.00

Visualization

Abstract

Log-linear models can be efficiently estimated using algorithms such as Improved Iterative Scaling (IIS) (Lafferty et al., 1997). Under certain conditions and for a particular class of problems, IIS is guaranteed to approach both the maximum-likelihood and maximum entropy solution. This solution, in likelihood space, is unique. Unfortunately, in realistic situations, multiple solutions may exist, all of which are equivalent to each other in terms of likelihood, but radically different from each other in terms of performance. We show that this behaviour can occur when a model contains overlapping features and the training material is sparse. Experimental results, from the domain of parse selection for stochastic attribute value grammars, shows the wide variation in performance that can be found when estimating models using IIS. Further results show that the influence of the initial model can be diminished by selecting either uniform weights, or else by model averaging.