Distributional clustering of words for text classification
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
A divisive information theoretic feature clustering algorithm for text classification
The Journal of Machine Learning Research
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Hi-index | 0.00 |
Distributional Clustering has showed to be an effective and powerful approach to supervised term extraction aimed at reducing the original indexing space dimensionality for Automatic Text Categorization [2]. In a recent paper [1] we introduced a new Signal Processing approach to Distributional Clustering which reached categorization results on 20 Newsgroups dataset similar to those obtained by other information-theoretic approaches [3][4][5]. Here we re-validate our method by showing that the 90-categories Reuters-21578 benchmark collection can be indexed with a minimum loss of categorization accuracy (around 2% with Naïve Bayes categorizer) with only 50 clusters.