Hierarchical Bayesian clustering for automatic text classification

  • Authors:
  • Makoto Iwayama;Takenobu Tokunaga

  • Affiliations:
  • Advanced Research Laboratory, Hitachi Ltd., Saitama, Japan;Department of Computer Science, Tokyo Institute of Technology, Tokyo, Japan

  • Venue:
  • IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text classification, the grouping of texts into several clusters, has been used as a means of improving both the efficiency and the effective-Dess of text retrieval/categorization In this paper we propose a hierarchical clustering algorithm that constructs a Bet of clusters having the maximum Bayesian posterior probability, the probability that the given texts are classified into clusters We call the algorithm Hierarchical Bayesian Clustering (HBC) The advantages of HBC are experimentally verified from several viewpoints (1) HBC can reconstruct the original clusters more accurately than do other non probabilistic algorithms (2) When a probabilistic text categorization is extended to a cluster-based one, the use of HBC offers better performance than does the use of non probabilistic algorithms.