Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
Noise reduction in a statistical approach to text categorization
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Feature selection, perceptron learning, and a usability case study for text categorization
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Selection of relevant features and examples in machine learning
Artificial Intelligence - Special issue on relevance
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
Making large-scale support vector machine learning practical
Advances in kernel methods
Context-sensitive learning methods for text categorization
ACM Transactions on Information Systems (TOIS)
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
A vector space model for automatic indexing
Communications of the ACM
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Hierarchical Text Categorization Using Neural Networks
Information Retrieval
A Study of Approaches to Hypertext Categorization
Journal of Intelligent Information Systems
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A scalability analysis of classifiers in text categorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
An introduction to variable and feature selection
The Journal of Machine Learning Research
Augmenting Naive Bayes Classifiers with Statistical Language Models
Information Retrieval
A pitfall and solution in multi-class feature selection for text classification
ICML '04 Proceedings of the twenty-first international conference on Machine learning
An analysis of the relative hardness of Reuters-21578 subsets: Research Articles
Journal of the American Society for Information Science and Technology
Journal of the American Society for Information Science and Technology
Introducing a Family of Linear Measures for Feature Selection in Text Categorization
IEEE Transactions on Knowledge and Data Engineering
AuToCrawler: An Integrated System for Automatic Topical Crawler
Proceedings of the Fourth Annual ACIS International Conference on Computer and Information Science
A comparative study on text representation schemes in text categorization
Pattern Analysis & Applications
A Hierarchical Neural Network Document Classifier with Linguistic Feature Selection
Applied Intelligence
Categorization-driven cross-language retrieval of medical information
Journal of the American Society for Information Science and Technology
Hierarchical document categorization with k-NN and concept-based thesauri
Information Processing and Management: an International Journal
A comparison of implicit and explicit links for web page classification
Proceedings of the 15th international conference on World Wide Web
Intelligent document classification
Intelligent Data Analysis
Combining knowledge- and corpus-based word-sense-disambiguation methods
Journal of Artificial Intelligence Research
A decision-tree-based symbolic rule induction system for text categorization
IBM Systems Journal
Feature selection strategies for text categorization
AI'03 Proceedings of the 16th Canadian society for computational studies of intelligence conference on Advances in artificial intelligence
Personalized news categorization through scalable text classification
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Document clustering using synthetic cluster prototypes
Data & Knowledge Engineering
Hi-index | 0.00 |
Feature Filtering is an approach that is widely used for dimensionality reduction in text categorization. In this approach feature scoring methods are used to evaluate features leading to selection. Thresholding is then applied to select the highest scoring features either locally or globally. In this paper, we investigate several local and global feature selection methods. The usage of Standard Deviation (STD) and Maximum Deviation (MD) as globalization schemes is suggested. This work provides a comparative study among fourteen thresholding techniques using different scoring methods and benchmark datasets of diverse nature. This includes investigation of normalizing feature scores before combining them in the global pool. The results suggest that normalized MD outperforms other methods in thresholding Document Frequency (DF) scores using even and moderate diverse data-sets. Furthermore, the results indicated that normalizing feature scores improves the performance of rare categories and balances the bias of some techniques to frequent categories.