Improving kNN text categorization by removing outliers from training set

Authors:
Kwangcheol Shin;Ajith Abraham;Sang Yong Han
Affiliations:
School of Computer Science and Engineering, Chung-Ang University, Seoul, Korea;School of Computer Science and Engineering, Chung-Ang University, Seoul, Korea;School of Computer Science and Engineering, Chung-Ang University, Seoul, Korea
Venue:
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Year:
2006

Citing 6
Cited 1

C4.5: programs for machine learning

C4.5: programs for machine learning
A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The nature of statistical learning theory

The nature of statistical learning theory
Modern Information Retrieval

Modern Information Retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Centroid-Based Document Classification: Analysis and Experimental Results

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery

A new nearest neighbor rule for text categorization

CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

We show that excluding outliers from the training data significantly improves kNN classifier, which in this case performs about 10% better than the best know method—Centroid-based classifier. Outliers are the elements whose similarity to the centroid of the corresponding category is below a threshold.