A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Selection Method of Text Tendency Classification
FSKD '08 Proceedings of the 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 02
Improvement of Text Feature Selection Method Based on TFIDF
FITME '08 Proceedings of the 2008 International Seminar on Future Information Technology and Management Engineering
A text classification method with an effective feature extraction based on category analysis
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
Categorical Document Frequency Based Feature Selection for Text Categorization
ICM '11 Proceedings of the 2011 International Conference of Information Technology, Computer Engineering and Management Sciences - Volume 02
A novel probabilistic feature selection method for text classification
Knowledge-Based Systems
Hi-index | 0.00 |
Feature selection is an important process to choose a subset of features relevant to a particular application in document classification. Firstly, based on the categorical document frequency probability (CDFP), CDFP_VM criterion was designed for feature selection. Secondly, a maximum conditional distribution factor was proposed to improve the CDFP_VM criterion further. The method has advantages in the case of choosing smaller number of features, especially for classes with small number of training documents. It keeps the best features in favor of neither high nor low DF frequency terms, thus improves the final performance of the document categorization system. We perform the experiments with the standard Fudan Chinese corpus and selected Sogou corpus as balanced and unbalanced corpus respectively. The experiment results demonstrate the effectiveness of the proposed feature selection method in Chinese document categorization.