Three new feature weighting methods for text categorization

Authors:
Wei Xue;Xinshun Xu
Affiliations:
School of Computer Science and Technology, Shandong University, Jinan, China;School of Computer Science and Technology, Shandong University, Jinan, China
Venue:
WISM'10 Proceedings of the 2010 international conference on Web information systems and mining
Year:
2010

Citing 10
Cited 0

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
An example-based mapping method for text categorization and retrieval

ACM Transactions on Information Systems (TOIS)
Feature selection, perceptron learning, and a usability case study for text categorization

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Information Retrieval

Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Feature Selection Framework for Text Filtering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Feature selection for text categorization on imbalanced data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets

Quantified Score

Hi-index	0.00

Visualization

Abstract

Feature weighting is an important phase of text categorization, which computes the feature weight for each feature of documents. This paper proposes three new feature weighting methods for text categorization. In the first and second proposed methods, traditional feature weighting method tf×idf is combined with "one-side" feature selection metrics (i.e. odds ratio, correlation coefficient) in a moderate manner, and positive and negative features are weighted separately. tf×idf+CC and tf×idf+OR are used to calculate the feature weights. In the third method, tf is combined with feature entropy, which is effective and concise. The feature entropy measures the diversity of feature's document frequency in different categories. The experimental results on Reuters-21578 corpus show that the proposed methods outperform several state-of-the-art feature weighting methods, such as tf×idf, tf×CHI, and tf×OR.