Similarity-based approaches to natural language processing
Similarity-based approaches to natural language processing
Distributional clustering of words for text classification
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
On feature distributional clustering for text categorization
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Distributional word clusters vs. words for text categorization
The Journal of Machine Learning Research
Distributional clustering of English words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Automatic text categorization by unsupervised learning
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
A word clustering approach for language model-based sentence retrieval in question answering systems
Proceedings of the 18th ACM conference on Information and knowledge management
Efficient Text Classification Using Term Projection
AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Long distance bigram models applied to word clustering
Pattern Recognition
Improving text categorization using domain knowledge
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Hi-index | 0.00 |
High dimensionality of feature space and short of training documents are the crucial obstacles for text categorization. In order to overcome these obstacles, this paper presents a cluster-based text categorization system which uses class distributional clustering of words. We propose a new clustering model which considers the global information over all the clusters. The model can be understood as the balance of all the clusters according to the number of words in them. It can group words into clusters based on the distribution of class labels associated with each word. Using these learned clusters as features, we develop a cluster-based classifier. We present several experimental results to show that our proposed method performs better than the other three text classifiers. The proposed model has better results than the model which only considers the information of the two related clusters. Specially, it can maintain good performance when the number of features is small and the size of training corpus is small.