Hierarchical document classification using automatically generated hierarchy

  • Authors:
  • Tao Li;Shenghuo Zhu;Mitsunori Ogihara

  • Affiliations:
  • School of Computer Science, Florida International University, Miami, USA 33199;NEC Labs America, Inc., Cupertino, USA 95014;Department of Computer Science, University of Rochester, Rochester, USA 14627-0226

  • Venue:
  • Journal of Intelligent Information Systems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automated text categorization has witnessed a booming interest with the exponential growth of information and the ever-increasing needs for organizations. The underlying hierarchical structure identifies the relationships of dependence between different categories and provides valuable sources of information for categorization. Although considerable research has been conducted in the field of hierarchical document categorization, little has been done on automatic generation of topic hierarchies. In this paper, we propose the method of using linear discriminant projection to generate more meaningful intermediate levels of hierarchies in large flat sets of classes. The linear discriminant projection approach first transforms all documents onto a low-dimensional space and then clusters the categories into hier- archies accordingly. The paper also investigates the effect of using generated hierarchical structure for text classification. Our experiments show that generated hierarchies improve classification performance in most cases.