Training a hierarchical classifier using inter document relationships

Authors:
Susan Gauch;Aravind Chandramouli;Shankar Ranganathan
Affiliations:
Department of Computer Science & Computer Engineering, University of Arkansas, Fayetteville, AR 72701;Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS 66046;Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS 66046
Venue:
Journal of the American Society for Information Science and Technology
Year:
2009

Citing 0
Cited 1

A survey of hierarchical classification across different application domains

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text classifiers automatically classify documents intoappropriate concepts for different applications. Mostclassification approaches use flat classifiers that treat eachconcept as independent, even when the concept space ishierarchically structured. In contrast, hierarchical textclassification exploits the structural relationships between theconcepts. In this article, we explore the effectiveness ofhierarchical classification for a large concept hierarchy. Sincethe quality of the classification is dependent on the quality andquantity of the training data, we evaluate the use of documentsselected from subconcepts to address the sparseness of trainingdata for the top-level classifiers and the use of documentrelationships to identify the most representative trainingdocuments. By selecting training documents using structural andsimilarity relationships, we achieve a statistically significantimprovement of 39.8% (from 54.576.2%) in the accuracy of thehierarchical classifier over that of the flat classifier for alarge, three-level concept hierarchy. © 2009 WileyPeriodicals, Inc.