Automatic word clustering for text categorization using global information

Authors:
Chen Wenliang;Chang Xingzhi;Wang Huizhen;Zhu Jingbo;Yao Tianshun
Affiliations:
Natural Language Processing Lab, Northeastern University, Shenyang, China;Natural Language Processing Lab, Northeastern University, Shenyang, China;Natural Language Processing Lab, Northeastern University, Shenyang, China;Natural Language Processing Lab, Northeastern University, Shenyang, China;Natural Language Processing Lab, Northeastern University, Shenyang, China
Venue:
AIRS'04 Proceedings of the 2004 international conference on Asian Information Retrieval Technology
Year:
2004

Citing 10
Cited 4

Similarity-based approaches to natural language processing

Similarity-based approaches to natural language processing
Distributional clustering of words for text classification

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
On feature distributional clustering for text categorization

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Distributional word clusters vs. words for text categorization

The Journal of Machine Learning Research
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Automatic text categorization by unsupervised learning

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1

A word clustering approach for language model-based sentence retrieval in question answering systems

Proceedings of the 18th ACM conference on Information and knowledge management
Efficient Text Classification Using Term Projection

AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Long distance bigram models applied to word clustering

Pattern Recognition
Improving text categorization using domain knowledge

NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

High dimensionality of feature space and short of training documents are the crucial obstacles for text categorization. In order to overcome these obstacles, this paper presents a cluster-based text categorization system which uses class distributional clustering of words. We propose a new clustering model which considers the global information over all the clusters. The model can be understood as the balance of all the clusters according to the number of words in them. It can group words into clusters based on the distribution of class labels associated with each word. Using these learned clusters as features, we develop a cluster-based classifier. We present several experimental results to show that our proposed method performs better than the other three text classifiers. The proposed model has better results than the model which only considers the information of the two related clusters. Specially, it can maintain good performance when the number of features is small and the size of training corpus is small.