Instance-Based Learning Algorithms
Machine Learning
Data preparation for data mining
Data preparation for data mining
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Statistical Pattern Recognition: A Review
IEEE Transactions on Pattern Analysis and Machine Intelligence
Reduction Techniques for Instance-BasedLearning Algorithms
Machine Learning
Outlier detection for high dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Instance Selection and Construction for Data Mining
Instance Selection and Construction for Data Mining
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
Advances in Instance Selection for Instance-Based Learning Algorithms
Data Mining and Knowledge Discovery
A Unifying View on Instance Selection
Data Mining and Knowledge Discovery
Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Distance-based outliers: algorithms and applications
The VLDB Journal — The International Journal on Very Large Data Bases
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Feature selection methods for text classification
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study
IEEE Transactions on Evolutionary Computation
Evolutionary instance selection for text classification
Journal of Systems and Software
Hi-index | 0.00 |
Automatic text classification is usually based on models constructed through learning from training examples. However, as the size of text document repositories grows rapidly, the storage requirements and computational cost of model learning is becoming ever higher. Instance selection is one solution to overcoming this limitation. The aim is to reduce the amount of data by filtering out noisy data from a given training dataset. A number of instance selection algorithms have been proposed in the literature, such as ENN, IB3, ICF, and DROP3. However, all of these methods have been developed for the k-nearest neighbor (k-NN) classifier. In addition, their performance has not been examined over the text classification domain where the dimensionality of the dataset is usually very high. The support vector machines (SVM) are core text classification techniques. In this study, a novel instance selection method, called Support Vector Oriented Instance Selection (SVOIS), is proposed. First of all, a regression plane in the original feature space is identified by utilizing a threshold distance between the given training instances and their class centers. Then, another threshold distance, between the identified data (forming the regression plane) and the regression plane, is used to decide on the support vectors for the selected instances. The experimental results based on the TechTC-100 dataset show the superior performance of SVOIS over other state-of-the-art algorithms. In particular, using SVOIS to select text documents allows the k-NN and SVM classifiers perform better than without instance selection.