A sequential algorithm for training text classifiers
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Little words can make a big difference for text classification
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Selection of relevant features and examples in machine learning
Artificial Intelligence - Special issue on relevance
Lazy learning
Machine Learning - Special issue on learning with probabilistic representations
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
Scalable association-based text classification
Proceedings of the ninth international conference on Information and knowledge management
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Less is More: Active Learning with Support Vector Machines
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Feature selection and feature extraction for text categorization
HLT '91 Proceedings of the workshop on Speech and Natural Language
Efficient multi-way text categorization via generalized discriminant analysis
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Feature selection methods for text classification
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Floatcascade learning for fast imbalanced web mining
Proceedings of the 17th international conference on World Wide Web
Text classification: a recent overview
ICCOMP'05 Proceedings of the 9th WSEAS International Conference on Computers
Spiral removal of exceptional patients for mining chronic hepatitis data
New Generation Computing
Text categorization via generalized discriminant analysis
Information Processing and Management: an International Journal
A First Study on the Use of Coevolutionary Algorithms for Instance and Feature Selection
HAIS '09 Proceedings of the 4th International Conference on Hybrid Artificial Intelligence Systems
On strategies for imbalanced text classification using SVM: A comparative study
Decision Support Systems
A rough set-based case-based reasoner for text categorization
International Journal of Approximate Reasoning
FISA: feature-based instance selection for imbalanced text classification
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Multi-strategy instance selection in mining chronic hepatitis data
ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
Two way focused classification
DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery
Genetic algorithms in feature and instance selection
Knowledge-Based Systems
Hi-index | 0.00 |
Instance selection and feature selection are two orthogonal methods for reducing the amount and complexity of data. Feature selection aims at the reduction of redundant features in a dataset whereas instance selection aims at the reduction of the number of instances. So far, these two methods have mostly been considered in isolation. In this paper, we present a new algorithm, which we call FIS (Feature and Instance Selection) that targets both problems simultaneously in the context of text classificationOur experiments on the Reuters and 20-Newsgroups datasets show that FIS considerably reduces both the number of features and the number of instances. The accuracy of a range of classifiers including Naïve Bayes, TAN and LB considerably improves when using the FIS preprocessed datasets, matching and exceeding that of Support Vector Machines, which is currently considered to be one of the best text classification methods. In all cases the results are much better compared to Mutual Information based feature selection. The training and classification speed of all classifiers is also greatly improved.