A hybrid evolutionary algorithm for attribute selection in data mining

Authors:
K. C. Tan;E. J. Teoh;Q. Yu;K. C. Goh
Affiliations:
Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117576, Singapore;Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117576, Singapore;Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117576, Singapore and Rochester Institute of Technology, USA;Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117576, Singapore
Venue:
Expert Systems with Applications: An International Journal
Year:
2009

Citing 13
Cited 4

Essentials of artificial intelligence

Essentials of artificial intelligence
Artificial intelligence: a modern approach

Artificial intelligence: a modern approach
The nature of statistical learning theory

The nature of statistical learning theory
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Three objective genetics-based machine learning for linguisitc rule extraction

Information Sciences: an International Journal - Recent advances in genetic fuzzy systems
Feature Selection for Knowledge Discovery and Data Mining

Feature Selection for Knowledge Discovery and Data Mining
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Generating Accurate Rule Sets Without Global Optimization

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Discovering Fuzzy Classification Rules with Genetic Programming and Co-evolution

PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
Data Mining and Knowledge Discovery in Databases: Implications for Scientific Databases

SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
Mining multiple comprehensible classification rules using genetic programming

CEC '02 Proceedings of the Evolutionary Computation on 2002. CEC '02. Proceedings of the 2002 Congress - Volume 02
A distributed evolutionary classifier for knowledge discovery in data mining

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Application of genetic programming for multicategory patternclassification

IEEE Transactions on Evolutionary Computation

Hepatitis disease diagnosis using a novel hybrid method based on support vector machine and simulated annealing (SVM-SA)

Computer Methods and Programs in Biomedicine
Improving support vector machine using a stochastic local search for classification in datamining

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part II
Survey paper: Knowledge discovery in clinical decision support systems for pain management: A systematic review

Artificial Intelligence in Medicine
An evolutionary-based fuzzy resource assignment strategy for elastic traffic

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology

Quantified Score

Hi-index	12.05

Visualization

Abstract

Real life data sets are often interspersed with noise, making the subsequent data mining process difficult. The task of the classifier could be simplified by eliminating attributes that are deemed to be redundant for classification, as the retention of only pertinent attributes would reduce the size of the dataset and subsequently allow more comprehensible analysis of the extracted patterns or rules. In this article, a new hybrid approach comprising of two conventional machine learning algorithms has been proposed to carry out attribute selection. Genetic algorithms (GAs) and support vector machines (SVMs) are integrated effectively based on a wrapper approach. Specifically, the GA component searches for the best attribute set by applying the principles of an evolutionary process. The SVM then classifies the patterns in the reduced datasets, corresponding to the attribute subsets represented by the GA chromosomes. The proposed GA-SVM hybrid is subsequently validated using datasets obtained from the UCI machine learning repository. Simulation results demonstrate that the GA-SVM hybrid produces good classification accuracy and a higher level of consistency that is comparable to other established algorithms. In addition, improvements are made to the hybrid by using a correlation measure between attributes as a fitness measure to replace the weaker members in the population with newly formed chromosomes. This injects greater diversity and increases the overall fitness of the population. Similarly, the improved mechanism is also validated on the same data sets used in the first stage. The results justify the improvements in the classification accuracy and demonstrate its potential to be a good classifier for future data mining purposes.