Improving k-Nearest Neighbour Classification with Distance Functions Based on Receiver Operating Characteristics

Authors:
Md. Rafiul Hassan;M. Maruf Hossain;James Bailey;Kotagiri Ramamohanarao
Affiliations:
Department of Computer Science and Software Engineering, The University of Melbourne, Australia;Department of Computer Science and Software Engineering, The University of Melbourne, Australia;Department of Computer Science and Software Engineering, The University of Melbourne, Australia and NICTA Victoria Laboratory, The University of Melbourne, Australia;Department of Computer Science and Software Engineering, The University of Melbourne, Australia and NICTA Victoria Laboratory, The University of Melbourne, Australia
Venue:
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Year:
2008

Citing 14
Cited 0

A practical approach to feature selection

ML92 Proceedings of the ninth international workshop on Machine learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Estimating attributes: analysis and extensions of RELIEF

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Discriminant Adaptive Nearest Neighbor Classification

IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning Decision Trees Using the Area Under the ROC Curve

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Distance Metrics for Instance-Bsed Learning

ISMIS '91 Proceedings of the 6th International Symposium on Methodologies for Intelligent Systems
Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification

PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
Generation of Similarity Measures from Different Sources

Proceedings of the 14th International conference on Industrial and engineering applications of artificial intelligence and expert systems: engineering of intelligent systems
A rank sum test method for informative gene discovery

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Selecting features in microarray classification using ROC curves

Pattern Recognition
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition
Using weighted nearest neighbor to benefit from unlabeled data

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Large margin nearest neighbor classifiers

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

The k-nearest neighbour (k-NN) technique, due to its interpretable nature, is a simple and very intuitively appealing method to address classification problems. However, choosing an appropriate distance function for k-NN can be challenging and an inferior choice can make the classifier highly vulnerable to noise in the data. In this paper, we propose a new method for determining a good distance function for k-NN. Our method is based on consideration of the area under the Receiver Operating Characteristics (ROC) curve, which is a well known method to measure the quality of binary classifiers. It computes weights for the distance function, based on ROC properties within an appropriate neighbourhood for the instances whose distance is being computed. We experimentally compare the effect of our scheme with a number of other well-known k-NN distance metrics, as well as with a range of different classifiers. Experiments show that our method can substantially boost the classification performance of the k-NN algorithm. Furthermore, in a number of cases our technique is even able to deliver better accuracy than state-of-the-art non k-NN classifiers, such as support vector machines.