Acquiring lexical generalizations from corpora: a case study for diathesis alternations
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning
Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning
Language Resources and Evaluation
Hi-index | 0.00 |
This paper addresses the problem of selecting the 'optimal' variable subset in a logistic regression model for a medium-sized data set. As a case study, we take the British English dative alternation, where speakers and writers can choose between two - equally grammatical - syntactic constructions to express the same meaning. With 29 explanatory variables taken from the literature, we build two types of models: one with the verb sense included as a random effect, and one without a random effect. For each type, we build three different models by including all variables and keeping the significant ones, by successively adding the most predictive variable (forward selection), and by successively removing the least predictive variable (backward elimination). Seeing that the six approaches lead to six different variable selections (and thus six different models), we conclude that the selection of the 'best' model requires a substantial amount of linguistic expertise.