A text classification method with an effective feature extraction based on category analysis

Authors:
Yun Li;Yan Sheng;Luan Luan;Ling Chen
Affiliations:
School of Information Engineering, Yangzhou University, Jiangsu, China;School of Information Engineering, Yangzhou University, Jiangsu, China;School of Information Engineering, Yangzhou University, Jiangsu, China;School of Information Engineering, Yangzhou University, Jiangsu, China
Venue:
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
Year:
2009

Citing 2
Cited 1

An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
An Extensive Empirical Study of Feature Selection for Text Categorization

ICIS '08 Proceedings of the Seventh IEEE/ACIS International Conference on Computer and Information Science (icis 2008)

Improved categorical distribution difference feature selection for Chinese document categorization

Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text classification refers to determine the class of an unknown text according to its content in the given classification system. In order to extract fewer features to express the information in the text as much as possible, the paper analysis the various features' statistical properties and to extract the global features according to Zipf's law; and then, based on the statistical analysis of the features' classified information, the efficient feature is extracted by computing the contribute of a feature; After that, the traditional TF-IDF formula is improved using category frequencies named by TF-IDF-CF for calculating the feature weight; Finally the text classification method is proposed. The experiment results illustrate that feature extraction methods proposed in the paper are effective and the formula TF-IDF-CF for calculating the feature weight has higher classification accuracy.