A refinement approach to handling model misfit in text categorization
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
We present a novel algorithm, DragPushing, for automatic text classification. Using a training data set, the algorithm first calculates the prototype vectors, or centroids, for each of the available document classes. Using misclassified examples, it then iteratively refines these centroids; by dragging the centroid of a correct class towards a misclassified example and in the same time pushing the centroid of an incorrect class away from the misclassified example. The algorithm is simple to implement and is computationally very efficient. Evaluation experiments conducted on two benchmark collections show that its classification accuracy is comparable to that of more complex methods, such as support vector machines (SVM).