Improving kNN text categorization by removing outliers from training set

  • Authors:
  • Kwangcheol Shin;Ajith Abraham;Sang Yong Han

  • Affiliations:
  • School of Computer Science and Engineering, Chung-Ang University, Seoul, Korea;School of Computer Science and Engineering, Chung-Ang University, Seoul, Korea;School of Computer Science and Engineering, Chung-Ang University, Seoul, Korea

  • Venue:
  • CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We show that excluding outliers from the training data significantly improves kNN classifier, which in this case performs about 10% better than the best know method—Centroid-based classifier. Outliers are the elements whose similarity to the centroid of the corresponding category is below a threshold.