Learning k-nearest neighbor naive bayes for ranking

Authors:
Liangxiao Jiang;Harry Zhang;Jiang Su
Affiliations:
Faculty of Computer Science, China University of Geosciences, Wuhan, P.R.China;Faculty of Computer Science, University of New Brunswick, Fredericton, NB, Canada;Faculty of Computer Science, University of New Brunswick, Fredericton, NB, Canada
Venue:
ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
Year:
2005

Citing 9
Cited 0

Instance-Based Learning Algorithms

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Lazy Learning of Bayesian Rules

Machine Learning
A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems

Machine Learning
SNNB: A Selective Neighborhood Based Naïve Bayes for Lazy Learning

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Tree Induction for Probability-Based Ranking

Machine Learning
Comparing Naive Bayes, Decision Trees, and SVM with AUC and Accuracy

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Locally weighted naive bayes

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Accurate probability-based ranking of instances is crucial in many real-world data mining applications. KNN (k-nearest neighbor) [1] has been intensively studied as an effective classification model in decades. However, its performance in ranking is unknown. In this paper, we conduct a systematic study on the ranking performance of KNN. At first, we compare KNN and KNNDW (KNN with distance weighted) to decision trees and naive Bayes in ranking, measured by AUC (the area under the Receiver Operating Characteristics curve). Then, we propose to improve the ranking performance of KNN by combining KNN with naive Bayes. The idea is that a naive Bayes is learned using the k nearest neighbors of the test instance as the training data and used to classify the test instance. A critical problem in combining KNN with naive Bayes is the lack of training data when k is small. We propose to deal with it using sampling to expand the training data. That is, each of the k nearest neighbors is “cloned” and the clones are added to the training data. We call our new model instance cloning local naive Bayes (simply ICLNB). We conduct extensive empirical comparison for the related algorithms in two groups in terms of AUC, using the 36 UCI datasets recommended by Weka[2]. In the first group, we compare ICLNB with other types of algorithms C4.4[3], naive Bayes and NBTree[4]. In the second group, we compare ICLNB with KNN, KNNDW and LWNB[5]. Our experimental results show that ICLNB outperforms all those algorithms significantly. From our study, we have two conclusions. First, KNN-relates algorithms performs well in ranking. Second, our new algorithm ICLNB performs best among the algorithms compared in this paper, and could be used in the applications in which an accurate ranking is desired.