Training algorithms for linear text classifiers
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Engineering for Text Classification
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
A study of local and global thresholding techniques in text categorization
AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Text classification: a recent overview
ICCOMP'05 Proceedings of the 9th WSEAS International Conference on Computers
Using Intuitionistic Fuzzy Sets in Text Categorization
ICAISC '08 Proceedings of the 9th international conference on Artificial Intelligence and Soft Computing
A class-feature-centroid classifier for text categorization
Proceedings of the 18th international conference on World wide web
Efficient Text Classification Using Best Feature Selection and Combination of Methods
Proceedings of the Symposium on Human Interface 2009 on ConferenceUniversal Access in Human-Computer Interaction. Part I: Held as Part of HCI International 2009
Fast dimension reduction for document classification based on imprecise spectrum analysis
Information Sciences: an International Journal
Hi-index | 0.00 |
Feature selection is an important research issue in text categorization. The reason for this is that thousands of features are often involved, even when the simplest document representation model, the so-called bag-of-words, is used. Among the many approaches to feature selection, the use of some scoring function to rank features to filter them out is an important one. Many of these functions are widely used in text categorization. In past feature selection studies, most researchers have focused on comparing these measures in terms of accuracy achieved. For any measure, however, there are many selection strategies that can be applied to produce the resulting feature set. In this paper, we compare some such strategies and propose a new one. Tests have been conducted to compare five selection strategies on four datasets, using three distinct classifiers and four common feature scoring functions. As a result, it is possible to better understand which strategies are suited to particular classification settings.