An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Robust feature selection by mutual information distributions
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Input feature selection for classification problems
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
Feature selection is a key step of web page categorization. It can influence the accuracy of categorization directly as well as the efficiency. This paper proposes a new approach of feature selection based on Mutual Information algorithm. It brings in feature whose Mutual Information is negative and emphasizes the occurrence probabilities of features in different categories. Moreover, it makes some improvements on the web page preprocessing to reserve some useful features. The experiment shows that the new feature selection method improves the accuracy of categorization effectively.