Selection of relevant features and examples in machine learning
Artificial Intelligence - Special issue on relevance
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Machine Learning
Feature selection on hierarchy of web documents
Decision Support Systems - Web retrieval and mining
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Hierarchically Classifying Documents Using Very Few Words
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Best terms: an efficient feature-selection algorithm for text categorization
Knowledge and Information Systems
IEEE Transactions on Pattern Analysis and Machine Intelligence
OCFS: optimal orthogonal centroid feature selection for text categorization
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Statistical Comparisons of Classifiers over Multiple Data Sets
The Journal of Machine Learning Research
Feature selection with a measure of deviations from Poisson in text categorization
Expert Systems with Applications: An International Journal
Feature selection for text classification with Naïve Bayes
Expert Systems with Applications: An International Journal
Class dependent feature scaling method using naive Bayes classifier for text datamining
Pattern Recognition Letters
Soft Computing - A Fusion of Foundations, Methodologies and Applications
Ambiguity measure feature-selection algorithm
Journal of the American Society for Information Science and Technology
IEEE Transactions on Knowledge and Data Engineering
A new feature selection algorithm based on binomial hypothesis testing for spam filtering
Knowledge-Based Systems
Nearest neighbor pattern classification
IEEE Transactions on Information Theory
Support vector machines for spam categorization
IEEE Transactions on Neural Networks
Text Document Clustering with Hybrid Feature Selection
Proceedings of International Conference on Information Integration and Web-based Applications & Services
Analyzing uncertainties of probabilistic rough set regions with game-theoretic rough sets
International Journal of Approximate Reasoning
Hi-index | 0.00 |
The feature selection, which can reduce the dimensionality of vector space without sacrificing the performance of the classifier, is widely used in text categorization. In this paper, we proposed a new feature selection algorithm, named CMFS, which comprehensively measures the significance of a term both in inter-category and intra-category. We evaluated CMFS on three benchmark document collections, 20-Newsgroups, Reuters-21578 and WebKB, using two classification algorithms, Naive Bayes (NB) and Support Vector Machines (SVMs). The experimental results, comparing CMFS with six well-known feature selection algorithms, show that the proposed method CMFS is significantly superior to Information Gain (IG), Chi statistic (CHI), Document Frequency (DF), Orthogonal Centroid Feature Selection (OCFS) and DIA association factor (DIA) when Naive Bayes classifier is used and significantly outperforms IG, DF, OCFS and DIA when Support Vector Machines are used.