Variable selection in logistic regression: the British English dative alternation

Authors:
Daphne Theijssen
Affiliations:
Centre for Language Studies, Radboud University Nijmegen, Nijmegen, The Netherlands
Venue:
ESSLLI'08/09 Proceedings of the 2008 international conference on Interfaces: explorations in logic, language and computation
Year:
2008

Citing 2
Cited 1

Acquiring lexical generalizations from corpora: a case study for diathesis alternations

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning

Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning

Evaluating automatic annotation: automatically detecting and enriching instances of the dative alternation

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses the problem of selecting the 'optimal' variable subset in a logistic regression model for a medium-sized data set. As a case study, we take the British English dative alternation, where speakers and writers can choose between two - equally grammatical - syntactic constructions to express the same meaning. With 29 explanatory variables taken from the literature, we build two types of models: one with the verb sense included as a random effect, and one without a random effect. For each type, we build three different models by including all variables and keeping the significant ones, by successively adding the most predictive variable (forward selection), and by successively removing the least predictive variable (backward elimination). Seeing that the six approaches lead to six different variable selections (and thus six different models), we conclude that the selection of the 'best' model requires a substantial amount of linguistic expertise.