Overlap pattern synthesis with an efficient nearest neighbor classifier

  • Authors:
  • P. Viswanath;Narasimha Murty;Shalabh Bhatnagar

  • Affiliations:
  • Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560 012, India;Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560 012, India;Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560 012, India

  • Venue:
  • Pattern Recognition
  • Year:
  • 2005

Quantified Score

Hi-index 0.01

Visualization

Abstract

Nearest neighbor (NN) classifier is the most popular non-parametric classifier. It is a simple classifier with no design phase and shows good performance. Important factors affecting the efficiency and performance of NN classifier are (i) memory required to store the training set, (ii) classification time required to search the nearest neighbor of a given test pattern, and (iii) due to the curse of dimensionality the number of training patterns needed by it to achieve a given classification accuracy becomes prohibitively large when the dimensionality of the data is high. In this paper, we propose novel techniques to improve the performance of NN classifier and at the same time to reduce its computational burden. These techniques are broadly based on: (i) overlap based pattern synthesis which can generate a larger number of artificial patterns than the number of input patterns and thus can reduce the curse of dimensionality effect, (ii) a compact representation of the given set of training patterns called overlap pattern graph (OLP-graph) which can be incrementally built by scanning the training set only once and (iii) an efficient NN classifier called OLP-NNC which directly works with OLP-graph and does implicit overlap based pattern synthesis. A comparison based on experimental results is given between some of the relevant classifiers. The proposed schemes are suitable for applications dealing with large and high dimensional datasets like those in data mining.