Feature Subset Selection within a Simulated Annealing DataMining Algorithm

Authors:
Justin C. W. Debuse;Victor J. Rayward-Smith
Affiliations:
Computing Department, School of Information Systems, University of East Anglia, Norwich NR4 7TJ, UK. E-mail: jcwd@sys.uea.ac.uk, vjrs@sys.uea.ac.uk;Computing Department, School of Information Systems, University of East Anglia, Norwich NR4 7TJ, UK. E-mail: jcwd@sys.uea.ac.uk, vjrs@sys.uea.ac.uk
Venue:
Journal of Intelligent Information Systems
Year:
1997

Citing 7
Cited 9

Convergence of an annealing algorithm

Mathematical Programming: Series A and B
Feature selection for automatic classification of non-Gaussian data

IEEE Transactions on Systems, Man and Cybernetics - Special issue on artificial intelligence
Applying statistical knowledge to database analysis and knowledge base construction

Proceedings of the sixth conference on Artificial intelligence applications
Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems

Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems
C4.5: programs for machine learning

C4.5: programs for machine learning
Induction of Decision Trees

Machine Learning
A Mathematical Theory of Communication

A Mathematical Theory of Communication

Discretisation of Continuous Commercial Database Features for a Simulated Annealing Data Mining Algorithm

Applied Intelligence
Developing classification techniques from biological databases using simulated annealing

Metaheuristics
LESS: A Model-Based Classifier for Sparse Subspaces

IEEE Transactions on Pattern Analysis and Machine Intelligence
A hybrid approach for feature subset selection using neural networks and ant colony optimization

Expert Systems with Applications: An International Journal
Data mining with a simulated annealing based fuzzy classification system

Pattern Recognition
Data mining with a simulated annealing based fuzzy classification system

Pattern Recognition
Consensus unsupervised feature ranking from multiple views

Pattern Recognition Letters
A Thermodynamical Search Algorithm for Feature Subset Selection

Neural Information Processing
mr2PSO: A maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

An overview of the principle feature subset selection methods isgiven. We investigate a number of measures of feature subset quality, usinglarge commercial databases. We develop an entropic measure, based upon theinformation gain approach used within ID3 and C4.5 to build trees, which isshown to give the best performance over our databases. This measure is usedwithin a simple feature subset selection algorithm and the technique is usedto generate subsets of high quality features from the databases. A simulatedannealing based data mining technique is presented and applied to thedatabases. The performance using all features is compared to that achievedusing the subset selected by our algorithm. We show that a substantialreduction in the number of features may be achieved together with animprovement in the performance of our data mining system. We also present amodification of the data mining algorithm, which allows it to simultaneouslysearch for promising feature subsets and high quality rules. The effect ofvarying the generality level of the desired pattern is alsoinvestigated.