A comparative experimental assessment of a threshold selection algorithm in hierarchical text categorization

  • Authors:
  • Andrea Addis;Giuliano Armano;Eloisa Vargiu

  • Affiliations:
  • University of Cagliari, Department of Electrical and Electronic Engineering;University of Cagliari, Department of Electrical and Electronic Engineering;University of Cagliari, Department of Electrical and Electronic Engineering

  • Venue:
  • ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most of the research on text categorization has focused on mapping text documents to a set of categories among which structural relationships hold, i.e., on hierarchical text categorization. For solutions of a hierarchical problem that make use of an ensemble of classifiers, the behavior of each classifier typically depends on an acceptance threshold, which turns a degree of membership into a dichotomous decision. In principle, the problem of finding the best acceptance thresholds for a set of classifiers related with taxonomic relationships is a hard problem. Hence, devising effective ways for finding suboptimal solutions to this problem may have great importance. In this paper, we assess a greedy threshold selection algorithm aimed at finding a suboptimal combination of thresholds in a hierarchical text categorization setting. Comparative experiments, performed on Reuters, report the performance of the proposed threshold selection algorithm against a relaxed brute-force algorithm and against two state-of-the-art algorithms. Results highlight the effectiveness of the approach.