CWC: A Clustering-Based Feature Weighting Approach for Text Classification

Authors:
Lin Zhu;Jihong Guan;Shuigeng Zhou
Affiliations:
Department of Computer Science and Engineering, Fudan University, 200433, China;Department of Computer Science and Technology, Tongji University, 201804, China;Department of Computer Science and Engineering, Fudan University, 200433, China
Venue:
MDAI '07 Proceedings of the 4th international conference on Modeling Decisions for Artificial Intelligence
Year:
2007

Citing 18
Cited 0

Information retrieval: data structures and algorithms

Information retrieval: data structures and algorithms
An example-based mapping method for text categorization and retrieval

ACM Transactions on Information Systems (TOIS)
Context-sensitive learning methods for text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing

Readings in information retrieval
Improved boosting algorithms using confidence-rated predictions

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Data mining with decision trees and decision rules

Future Generation Computer Systems - Special double issue on data mining
Using a generalized instance set for automatic text categorization

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Text categorization based on k-nearest neighbor approach for web site classification

Information Processing and Management: an International Journal
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Feature selection for text categorization on imbalanced data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Feature selection with conditional mutual information maximin in text categorization

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Exploration of textual document archives using a fuzzy hierarchical clustering algorithm in the GAMBAL system

Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most existing text classification methods use the vector space model to represent documents, and the document vectors are evaluated by the TF-IDFmethod. However, TF-IDFweighting does not take into account the fact that the weight of a feature in a document is related not only to the document, but also to the class that document belongs to. In this paper, we present a Clustering-based feature Weighting approach for text Classification, or CWCfor short. CWCtakes each class in the training collection as a known cluster, and searches for feature weights iteratively to optimize the clustering objective function, so the best clustering result is achieved, and documents in different classes can be best distinguished by using the resulting feature weights. Performance of CWCis validated by conducting classification over two real text collections, and experimental results show that CWCoutperforms the traditional KNN.