Classification algorithms
Proceedings of the 2002 ACM symposium on Applied computing
k-nearest Neighbor Classification on Spatial Data Streams Using P-trees
PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Hi-index | 0.00 |
Given a set of training data, nearest neighbor classification predicts the class value for an unknown tuple X by searching the training set for the k nearest neighbors to X and then classifying X according to the most frequent class among the k neighbors. Each of the k nearest neighbors casts an equal vote for the class of X. In this paper, we propose a new algorithm, Podium Incremental Neighbor Evaluator (PINE), in which nearest neighbors are weighted for voting. A metric called HOBBit is used as the distance metric, and a data structure, the P-tree, is used for efficient implementation of the PINE algorithm on spatial data. Our experiments show that by using a Gaussian podium function, PINE outperforms the k-nearest neighbor (KNN) method in terms of classification accuracy for spatial data. In addition, in the PINE algorithm, all the instances are potential neighbors so that the value of k need not be pre-specified as in KNN methods. By assigning high weights to the nearest neighbors and low (even zero) weights to other neighbors, high classification accuracy can be achieved.