Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
Feature Extraction, Construction and Selection: A Data Mining Perspective
Feature Extraction, Construction and Selection: A Data Mining Perspective
Parallel and Distributed Association Mining: A Survey
IEEE Concurrency
Scalable Parallel Data Mining for Association Rules
IEEE Transactions on Knowledge and Data Engineering
A Parallel Genetic Algorithm for Rule Mining
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Evaluation of sampling for data mining of association rules
RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
A Branch and Bound Algorithm for Feature Subset Selection
IEEE Transactions on Computers
ChiMerge: discretization of numeric attributes
AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
Building with paradisEO reusable parallel and distributed evolutionary algorithms
Parallel Computing - Special issue: Parallel and nature-inspired computational paradigms and applications
Grid computing for parallel bioinspired algorithms
Journal of Parallel and Distributed Computing - Special issue on parallel bioinspired algorithms
Research of multi-population agent genetic algorithm for feature selection
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
Mining predictive models in dense databases is CPU time consuming and I/O intensive. In this paper, we propose a taxonomy of existing techniques allowing to achieve high performance. We propose a hybrid approach allowing to exploit four of them: feature selection, GA-based exploration space reduction, parallelism and concurrency. The approach is experimented on a near-infrared (NIR) spectroscopic application. It consists of predicting the concentration of a given component in a given product from its absorbances to NIR radiations. Statistical methods, like PLS, are well-suited and efficient for such data mining task. The experimental results show that preceding those methods with a feature selection allows to withdraw a significant number of irrelevant features and at the same time to enhance significantly the accuracy of the discovered predictive model. It is also shown that for the considered task the GA-based approach allows to build more accurate models than neural networks. Moreover, the parallel multithreaded implementation of the approach allows a linear speed-up.