Parallel feature selection for regularized least-squares

  • Authors:
  • Sebastian Okser;Antti Airola;Tero Aittokallio;Tapio Salakoski;Tapio Pahikkala

  • Affiliations:
  • TUCS - Turku Centre for Computer Science, Finland,Department of Information Technology, University of Turku, Finland;TUCS - Turku Centre for Computer Science, Finland,Department of Information Technology, University of Turku, Finland;TUCS - Turku Centre for Computer Science, Finland,Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Finland,Department of Mathematics, University of Turku, Finland;TUCS - Turku Centre for Computer Science, Finland,Department of Information Technology, University of Turku, Finland;TUCS - Turku Centre for Computer Science, Finland,Department of Information Technology, University of Turku, Finland

  • Venue:
  • PARA'12 Proceedings of the 11th international conference on Applied Parallel and Scientific Computing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper introduces a parallel version of the machine learning based feature selection algorithm known as greedy regularized least-squares (RLS). The aim of such machine learning methods is to develop accurate predictive models on complex datasets. Greedy RLS is an efficient implementation of the greedy forward feature selection procedure using regularized least-squares, capable of efficiently selecting the most predictive features from large datasets. It has previously been shown, through the use of matrix algebra shortcuts, to perform feature selection in only a fraction of the time required by traditional implementations. In this paper, the algorithm is adapted to allow for efficient parallel-based feature selection in order to scale the method to run on modern clusters. To demonstrate its effectiveness in practice, we implemented it on a sample genome-wide association study, as well as a number of other high-dimensional datasets, scaling the method to up to 128 cores.