Hierarchical classification of OAI metadata using the DDC taxonomy
NLP4DL'09/AT4DL'09 Proceedings of the 2009 international conference on Advanced language technologies for digital libraries
Hi-index | 0.00 |
Hierarchical document classification refers to assigning one or more suitable categories from a hierarchical category space to a document. This paper proposes a new hierarchical document classification method based on a backtracking algorithm. Utilizing the relationships betw- een categories in category tree, a suitable threshold for every category is found to determine whether a document could be classified into the category. And the backtracking algorithm in our hierarchical classification approach effectively solves the problem that a misclassification at higher level directly leads to the misclassification at a lower level. Moreover, feature set is selected by integrat- ing information gain with hierarchy information, which accords with the characteristic of a category tree. Experiments show that the method performs well when enough training documents are given.