Parallel GA-Based Wrapper Feature Selection for Spectroscopic Data Mining

Authors:
Nordine Melab;Sébastien Cahon;El-Ghazali Talbi;Ludovic Duponchel
Affiliations:
-;-;-;-
Venue:
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Year:
2002

Citing 9
Cited 3

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing

PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
Feature Extraction, Construction and Selection: A Data Mining Perspective

Feature Extraction, Construction and Selection: A Data Mining Perspective
Parallel and Distributed Association Mining: A Survey

IEEE Concurrency
Scalable Parallel Data Mining for Association Rules

IEEE Transactions on Knowledge and Data Engineering
A Parallel Genetic Algorithm for Rule Mining

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Evaluation of sampling for data mining of association rules

RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
A Branch and Bound Algorithm for Feature Subset Selection

IEEE Transactions on Computers
ChiMerge: discretization of numeric attributes

AAAI'92 Proceedings of the tenth national conference on Artificial intelligence

Building with paradisEO reusable parallel and distributed evolutionary algorithms

Parallel Computing - Special issue: Parallel and nature-inspired computational paradigms and applications
Grid computing for parallel bioinspired algorithms

Journal of Parallel and Distributed Computing - Special issue on parallel bioinspired algorithms
Research of multi-population agent genetic algorithm for feature selection

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mining predictive models in dense databases is CPU time consuming and I/O intensive. In this paper, we propose a taxonomy of existing techniques allowing to achieve high performance. We propose a hybrid approach allowing to exploit four of them: feature selection, GA-based exploration space reduction, parallelism and concurrency. The approach is experimented on a near-infrared (NIR) spectroscopic application. It consists of predicting the concentration of a given component in a given product from its absorbances to NIR radiations. Statistical methods, like PLS, are well-suited and efficient for such data mining task. The experimental results show that preceding those methods with a feature selection allows to withdraw a significant number of irrelevant features and at the same time to enhance significantly the accuracy of the discovered predictive model. It is also shown that for the considered task the GA-based approach allows to build more accurate models than neural networks. Moreover, the parallel multithreaded implementation of the approach allows a linear speed-up.