Improved categorical distribution difference feature selection for Chinese document categorization
Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
Hi-index | 0.00 |
Recently, automatic text categorization has made rapid progress and been one of the hotspots in the information processing field. Text tendency classification is one type of text categorization, which has very important applications in information retrieval、bad information identification and filtering, content security management and analysis of public opinion tendency. To aim at the important influence of feature selection on text classification accuracy, this paper mainly studied feature selection method of tendency classification. First, to analyze and summarize the current variety methods, it points out three common ideas of feature selection. Then based on the analysis of complexity of tendency classification, it is proved that feature selection method based on the features' distribution in text categories is more suitable for tendency classification than the method based on the correlativity of features and categories. Finally, it gives test results for balanced training sets and unbalanced training sets.