Discretisation of Continuous Commercial Database Features for a Simulated Annealing Data Mining Algorithm

  • Authors:
  • Justin C. W. Debuse;Victor J. Rayward-Smith

  • Affiliations:
  • School of Information Systems, University of East Anglia, Norwich NR4 7TJ, UK. jcwd@sys.uea.ac.uk;School of Information Systems, University of East Anglia, Norwich NR4 7TJ, UK. vjrs@sys.uea.ac.uk

  • Venue:
  • Applied Intelligence
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

An introduction to the approaches used to discretisecontinuous database features is given, together with a discussion ofthe potential benefits of such techniques. These benefits areinvestigated by applying discretisation algorithms to two largecommercial databases; the discretisations yielded are then evaluatedusing a simulated annealing based data mining algorithm. The resultsproduced suggest that dramatic reductions in problem size may beachieved, yielding improvements in the speed of the data miningalgorithm. However, it is also demonstrated under certaincircumstances that the discretisation produced may give an increasein problem size or allow overfitting by the data mining algorithm.Such cases, within which often only a small proportion of thedatabase belongs to the class of interest, highlight the need bothfor caution when producing discretisations and for the development ofmore robust discretisation algorithms.