Theme word subspace method for text document categorization
DM-IKM '12 Proceedings of the Data Mining and Intelligent Knowledge Management Workshop
Hi-index | 0.02 |
Support Vector Machine (SVM) is an effective classifier for classification task, but a vital shortcoming of SVM is that it needs huge computation for large scale learning tasks. Sample selection is a feasible strategy to overcome the problem. In order to rapidly reduce training samples without sacrificing recognition accuracy, this paper presents a novel sample selection strategy based on subspace distance, called subspace sample selection. Subspace selection method tries to select boundary samples of each class convex hull by iteratively absorbing the furthest sample to the subspace of chosen samples. This selection method can efficiently represent original training set and support SVM classification. Experimental results also show that our sample selection method can select fewer high quality samples to maintain the recognition accuracy of SVM