SMART-TV: a fast and scalable nearest neighbor based classifier for data mining

  • Authors:
  • Taufik Abidin;William Perrizo

  • Affiliations:
  • North Dakota State University, Fargo, ND;North Dakota State University, Fargo, ND

  • Venue:
  • Proceedings of the 2006 ACM symposium on Applied computing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

K-nearest neighbors (KNN) is the simplest method for classification. Given a set of objects in a multi-dimensional feature space, the method assigns a category to an unclassified object based on the plurality of category of the k-nearest neighbors. The closeness between objects is determined using a distance measure, e.g. Euclidian distance. Despite its simplicity, KNN also has some drawbacks: 1) it suffers from expensive computational cost in training when the training set contains millions of objects; 2) its classification time is linear to the size of the training set. The larger the training set, the longer it takes to search for the k-nearest neighbors. In this paper, we propose a new algorithm, called SMART-TV (Small Absolute difference of Total Variation), that approximates a set of potential candidates of nearest neighbors by examining the absolute difference of total variation between each data object in the training set and the unclassified object. Then, the k-nearest neighbors are searched from that candidate set. We empirically evaluate the performance of our algorithm on both real and synthetic datasets and find that SMART-TV is fast and scalable. The classification accuracy of SMART-TV is high and comparable to the accuracy of the traditional KNN algorithm.