On feature distributional clustering for text categorization

  • Authors:
  • Ron Bekkerman;Ran El-Yaniv;Naftali Tishby;Yoad Winter

  • Affiliations:
  • Technion Univ., Haifa, Israel;Technion Univ., Haifa, Israel;The Hebrew Univ., Jerusalem, Israel;Technion Univ., Haifa, Israel

  • Venue:
  • Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe a text categorization approach that is based on a combination of feature distributional clusters with a support vector machine (SVM) classifier. Our feature selection approach employs distributional clustering of words via the recently introducedinformation bottleneck method, which generates a more efficientword-clusterrepresentation of documents. Combined with the classification power of an SVM, this method yields high performance text categorization that can outperform other recent methods in terms of categorization accuracy and representation efficiency. Comparing the accuracy of our method with other techniques, we observe significant dependency of the results on the data set. We discuss the potential reasons for this dependency.