Online optimization for variable selection in data streams

  • Authors:
  • Christoforos Anagnostopoulos;Dimitris Tasoulis;David J. Hand;Niall M. Adams

  • Affiliations:
  • The Institute for Mathematical Sciences, Imperial College London, SW7 2PG, London;The Institute for Mathematical Sciences, Imperial College London, SW7 2PG, London;The Institute for Mathematical Sciences, Imperial College London, SW7 2PG, London and Department of Mathematics, Imperial College London, South Kensington Campus, London SW7 2AZ, UK;Department of Mathematics, Imperial College London, South Kensington Campus, London SW7 2AZ, UK

  • Venue:
  • Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Variable selection for regression is a classical statistical problem, motivated by concerns that too many covariates invite overfitting. Existing approaches notably include a class of convex optimisation techniques, such as the Lasso algorithm. Such techniques are invariably reliant on assumptions that are unrealistic in streaming contexts, namely that the data is available off-line and the correlation structure is static. In this paper, we relax both these constraints, proposing for the first time an online implementation of the Lasso algorithm with exponential forgetting. We also optimise the model dimension and the speed of forgetting in an online manner, resulting in a fully automatic scheme. In simulations our scheme improves on recursive least squares in dynamic environments, while also featuring model discovery and changepoint detection capabilities.