Centroid-Based Document Classification: Analysis and Experimental Results
PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Supervised Latent Semantic Indexing for Document Categorization
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
An improved K-nearest-neighbor algorithm for text categorization
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
Motivated by the effectiveness of centroid-based text classification techniques, we propose a classification-oriented class-centroid-based dimension reduction (DR) method, called CentroidDR. Basically, CentroidDR projects high-dimensional documents into a low-dimensional space spanned by class centroids. On this class-centroid-based space, the centroid-based classifier essentially becomes CentroidDR plus a simple linear classifier. Other classification techniques, such as K-Nearest Neighbor (KNN) classifiers, can be used to replace the simple linear classifier to form much more effective text classification algorithms. Though CentroidDR is simple, non-parametric and runs in linear time, preliminary experimental results show that it can improve the accuracy of the classifiers and perform better than general DR methods such as Latent Semantic Indexing (LSI).