Heterogeneous information integration in hierarchical text classification

  • Authors:
  • Huai-Yuan Yang;Tie-Yan Liu;Li Gao;Wei-Ying Ma

  • Affiliations:
  • 5F Sigma Center, Microsoft Research Asia, Beijing, P.R. China;5F Sigma Center, Microsoft Research Asia, Beijing, P.R. China;Department of Scientific & Engineering Computing School of Mathematical Sciences, Peking University, Beijing, P.R. China;5F Sigma Center, Microsoft Research Asia, Beijing, P.R. China

  • Venue:
  • PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Previous work has shown that considering the category distance in the taxonomy tree can improve the performance of text classifiers. In this paper, we propose a new approach to further integrate more categorical information in the text corpus using the principle of multi-objective programming (MOP). That is, we not only consider the distance between categories defined by the branching of the taxonomy tree, but also consider the similarity between categories defined by the document/term distributions in the feature space. Consequently, we get a refined category distance by using MOP to leverage these two kinds of information. Experiments on both synthetic and real-world datasets demonstrated the effectiveness of the proposed algorithm in hierarchical text classification.