A survey of hierarchical classification across different application domains
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
Text classifiers automatically classify documents intoappropriate concepts for different applications. Mostclassification approaches use flat classifiers that treat eachconcept as independent, even when the concept space ishierarchically structured. In contrast, hierarchical textclassification exploits the structural relationships between theconcepts. In this article, we explore the effectiveness ofhierarchical classification for a large concept hierarchy. Sincethe quality of the classification is dependent on the quality andquantity of the training data, we evaluate the use of documentsselected from subconcepts to address the sparseness of trainingdata for the top-level classifiers and the use of documentrelationships to identify the most representative trainingdocuments. By selecting training documents using structural andsimilarity relationships, we achieve a statistically significantimprovement of 39.8% (from 54.576.2%) in the accuracy of thehierarchical classifier over that of the flat classifier for alarge, three-level concept hierarchy. © 2009 WileyPeriodicals, Inc.