Discretisation of Continuous Commercial Database Features for a Simulated Annealing Data Mining Algorithm

Authors:
Justin C. W. Debuse;Victor J. Rayward-Smith
Affiliations:
School of Information Systems, University of East Anglia, Norwich NR4 7TJ, UK. jcwd@sys.uea.ac.uk;School of Information Systems, University of East Anglia, Norwich NR4 7TJ, UK. vjrs@sys.uea.ac.uk
Venue:
Applied Intelligence
Year:
1999

Citing 11
Cited 4

Convergence of an annealing algorithm

Mathematical Programming: Series A and B
Synthesizing Statistical Knowledge from Incomplete Mixed-Mode Data

IEEE Transactions on Pattern Analysis and Machine Intelligence
Inferring decision trees using the minimum description length principle

Information and Computation
On changing continuous attributes into ordered discrete attributes

EWSL-91 Proceedings of the European working session on learning on Machine learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Simulated annealing

Modern heuristic techniques for combinatorial problems
Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

Machine Learning
Efficient agnostic PAC-learning with simple hypothesis

COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Feature Subset Selection within a Simulated Annealing DataMining Algorithm

Journal of Intelligent Information Systems
Clustering Algorithms

Clustering Algorithms
Improved use of continuous attributes in C4.5

Journal of Artificial Intelligence Research

Generalized Radial Basis Function Networks Trained with Instance Based Learning for Data Mining of Symbolic Data

Applied Intelligence
Generalised RBF Networks Trained Using an IBL Algorithm for Mining Symbolic Data

PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
Symbolic adaptive neuro-fuzzy inference for data mining of heterogenous data

Intelligent Data Analysis
Search intensity versus search diversity: a false trade off?

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

An introduction to the approaches used to discretisecontinuous database features is given, together with a discussion ofthe potential benefits of such techniques. These benefits areinvestigated by applying discretisation algorithms to two largecommercial databases; the discretisations yielded are then evaluatedusing a simulated annealing based data mining algorithm. The resultsproduced suggest that dramatic reductions in problem size may beachieved, yielding improvements in the speed of the data miningalgorithm. However, it is also demonstrated under certaincircumstances that the discretisation produced may give an increasein problem size or allow overfitting by the data mining algorithm.Such cases, within which often only a small proportion of thedatabase belongs to the class of interest, highlight the need bothfor caution when producing discretisations and for the development ofmore robust discretisation algorithms.