Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists
Information Retrieval
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Selection for Unbalanced Class Distribution and Naive Bayes
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
An adaptive k-nearest neighbor text categorization strategy
ACM Transactions on Asian Language Information Processing (TALIP)
Toward Integrating Feature Selection Algorithms for Classification and Clustering
IEEE Transactions on Knowledge and Data Engineering
An analysis of the relative hardness of Reuters-21578 subsets: Research Articles
Journal of the American Society for Information Science and Technology
A review of feature selection techniques in bioinformatics
Bioinformatics
User-Oriented Feature Selection for Machine Learning
The Computer Journal
Attribute reduction in decision-theoretic rough set models
Information Sciences: an International Journal
Two novel feature selection approaches for web page classification
Expert Systems with Applications: An International Journal
Web page classification: Features and algorithms
ACM Computing Surveys (CSUR)
Feature selection with a measure of deviations from Poisson in text categorization
Expert Systems with Applications: An International Journal
A Competitive Term Selection Method for Information Retrieval
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Distinctive characteristics of a metric using deviations from Poisson for feature selection
Expert Systems with Applications: An International Journal
Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Expert Systems with Applications: An International Journal
Incorporating game theory in feature selection for text categorization
RSFDGrC'11 Proceedings of the 13th international conference on Rough sets, fuzzy sets, data mining and granular computing
Application of text categorization to astronomy field
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Feature selection with adjustable criteria
RSFDGrC'05 Proceedings of the 10th international conference on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing - Volume Part I
Enhancement of DTP feature selection method for text categorization
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
A three-way decision approach to email spam filtering
AI'10 Proceedings of the 23rd Canadian conference on Advances in Artificial Intelligence
Fast feature selection aimed at high-dimensional data via hybrid-sequential-ranked searches
Expert Systems with Applications: An International Journal
Unsupervised topic detection model and its application in text categorization
Proceedings of the CUBE International Information Technology Conference
Analyzing uncertainties of probabilistic rough set regions with game-theoretic rough sets
International Journal of Approximate Reasoning
Hi-index | 12.05 |
Text categorization plays an important role in applications where information is filtered, monitored, personalized, categorized, organized or searched. Feature selection remains as an effective and efficient technique in text categorization. Feature selection metrics are commonly based on term frequency or document frequency of a word. We focus on relative importance of these frequencies for feature selection metrics. The document frequency based metrics of discriminative power measure and GINI index were examined with term frequency for this purpose. The metrics were compared and analyzed on Reuters 21,578 dataset. Experimental results revealed that the term frequency based metrics may be useful especially for smaller feature sets. Two characteristics of term frequency based metrics were observed by analyzing the scatter of features among classes and the rate at which information in data was covered. These characteristics may contribute toward their superior performance for smaller feature sets.