Parallel GA-Based Wrapper Feature Selection for Spectroscopic Data Mining

  • Authors:
  • Nordine Melab;Sébastien Cahon;El-Ghazali Talbi;Ludovic Duponchel

  • Affiliations:
  • -;-;-;-

  • Venue:
  • IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Mining predictive models in dense databases is CPU time consuming and I/O intensive. In this paper, we propose a taxonomy of existing techniques allowing to achieve high performance. We propose a hybrid approach allowing to exploit four of them: feature selection, GA-based exploration space reduction, parallelism and concurrency. The approach is experimented on a near-infrared (NIR) spectroscopic application. It consists of predicting the concentration of a given component in a given product from its absorbances to NIR radiations. Statistical methods, like PLS, are well-suited and efficient for such data mining task. The experimental results show that preceding those methods with a feature selection allows to withdraw a significant number of irrelevant features and at the same time to enhance significantly the accuracy of the discovered predictive model. It is also shown that for the considered task the GA-based approach allows to build more accurate models than neural networks. Moreover, the parallel multithreaded implementation of the approach allows a linear speed-up.