Parallelizing Feature Selection

  • Authors:
  • Jerffeson Teixeira de Souza;Stan Matwin;Nathalie Japkowicz

  • Affiliations:
  • Computer Science Department, State University of Ceara, Fortaleza, 60740-000, Brazil;School of Information Technology and Engineering, University of Ottawa, Ottawa, K1N 6N5, Canada;School of Information Technology and Engineering, University of Ottawa, Ottawa, K1N 6N5, Canada

  • Venue:
  • Algorithmica
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Classification is a key problem in machine learning/data mining. Algorithms for classification have the ability to predict the class of a new instance after having been trained on data representing past experience in classifying instances. However, the presence of a large number of features in training data can hurt the classification capacity of a machine learning algorithm. The Feature Selection problem involves discovering a subset of features such that a classifier built only with this subset would attain predictive accuracy no worse than a classifier built from the entire set of features. Several algorithms have been proposed to solve this problem. In this paper we discuss how parallelism can be used to improve the performance of feature selection algorithms. In particular, we present, discuss and evaluate a coarse-grained parallel version of the feature selection algorithm FortalFS. This algorithm performs well compared with other solutions and it has certain characteristics that makes it a good candidate for parallelization. Our parallel design is based on the master--slave design pattern. Promising results show that this approach is able to achieve near optimum speedups in the context of Amdahl's Law.