Utilizing global and path information with language modelling for hierarchical text classification

  • Authors:
  • Heung-Seon Oh;Sung-Hyon Myaeng

  • Affiliations:
  • ;

  • Venue:
  • Journal of Information Science
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Hierarchical text classification of a Web taxonomy is challenging because it is a very large-scale problem with hundreds of thousands of categories and associated documents. Furthermore, the conceptual levels and training data availabilities of categories vary widely. The narrow-down approach is the state of the art; it utilizes a search engine for generating candidates from the taxonomy and builds a classifier for the final category selection. In this paper, we take the same approach but address the issue of using global information in a language modelling framework to improve effectiveness. We propose three methods of using non-local information for the task: a passive way of utilizing global information for smoothing; an aggressive way where a top-level classifier is built and integrated with a local model; and a method of using label terms associated with the path from a category to the root, which is based on our systematic observation that they are underrepresented in the documents. For evaluation, we constructed a document collection from Web pages in the Open Directory Project. A series of experiments and their results show the superiority of our methods and reveal the role of global information in hierarchical text classification.