Optimising the Distance Metric in the Nearest Neighbour Algorithm on a Real-World Patient Classification Problem

  • Authors:
  • Hongxing He;Simon Hawkins

  • Affiliations:
  • -;-

  • Venue:
  • PAKDD '99 Proceedings of the Third Pacific-Asia Conference on Methodologies for Knowledge Discovery and Data Mining
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

The study develops a new method for finding the optimal non-Euclidean distance metric in the nearest neighbour algorithm. The data used to develop this method is a real world doctor shopper classification problem. A statistical measure derived from Shannon's information theory - known as mutual information - is used to weight attributes in the distance metric. This weighted distance metric produced a much better agreement rate on a five-class classification task than the Euclidean distance metric (63% versus 51%). The agreement rate increased to 77% and 73% respectively when a genetic algorithm and simulated annealing were used to further optimise the weights. This excellent performance paves the way for the development of a highly accurate system for detecting high risk doctor-shoppers both automatically and efficiently.