Training a hierarchical classifier using inter document relationships

  • Authors:
  • Susan Gauch;Aravind Chandramouli;Shankar Ranganathan

  • Affiliations:
  • Department of Computer Science & Computer Engineering, University of Arkansas, Fayetteville, AR 72701;Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS 66046;Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS 66046

  • Venue:
  • Journal of the American Society for Information Science and Technology
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text classifiers automatically classify documents intoappropriate concepts for different applications. Mostclassification approaches use flat classifiers that treat eachconcept as independent, even when the concept space ishierarchically structured. In contrast, hierarchical textclassification exploits the structural relationships between theconcepts. In this article, we explore the effectiveness ofhierarchical classification for a large concept hierarchy. Sincethe quality of the classification is dependent on the quality andquantity of the training data, we evaluate the use of documentsselected from subconcepts to address the sparseness of trainingdata for the top-level classifiers and the use of documentrelationships to identify the most representative trainingdocuments. By selecting training documents using structural andsimilarity relationships, we achieve a statistically significantimprovement of 39.8% (from 54.576.2%) in the accuracy of thehierarchical classifier over that of the flat classifier for alarge, three-level concept hierarchy. © 2009 WileyPeriodicals, Inc.