The nature of statistical learning theory
The nature of statistical learning theory
Feature selection, perceptron learning, and a usability case study for text categorization
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing
Communications of the ACM
Text databases & document management
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Applying an existing machine learning algorithm to text categorization
Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing
Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization
ECDL '00 Proceedings of the 4th European Conference on Research and Advanced Technology for Digital Libraries
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Feature selection and feature extraction for text categorization
HLT '91 Proceedings of the workshop on Speech and Natural Language
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
ACM SIGIR Forum
LIBLINEAR: A Library for Large Linear Classification
The Journal of Machine Learning Research
An examination of feature selection frameworks in text categorization
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
The Effect of Stemming on Arabic Text Classification: An Empirical Study
International Journal of Information Retrieval Research
Hi-index | 0.00 |
In text categorization, feature selection can be essential not only for reducing the index size but also for improving the performance of the classifier. In this article, we propose a feature selection criterion, called Entropy based Category Coverage Difference (ECCD). On the one hand, this criterion is based on the distribution of the documents containing the term in the categories, but on the other hand, it takes into account its entropy. ECCD compares favorably with usual feature selection methods based on document frequency (DF), information gain (IG), mutual information (IM), χ2, odd ratio and GSS on a large collection of XML documents from Wikipedia encyclopedia. Moreover, this comparative study confirms the effectiveness of selection feature techniques derived from the χ2 statistics.