Active Learning Strategies for Multi-Label Text Classification

  • Authors:
  • Andrea Esuli;Fabrizio Sebastiani

  • Affiliations:
  • Istituto di Scienza e Tecnologia dell'Informazione, Consiglio Nazionale delle Ricerche, Pisa, Italy 56124;Istituto di Scienza e Tecnologia dell'Informazione, Consiglio Nazionale delle Ricerche, Pisa, Italy 56124

  • Venue:
  • ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Active learning refers to the task of devising a ranking function that, given a classifier trained from relatively few training examples, ranks a set of additional unlabeled examples in terms of how much further information they would carry, once manually labeled, for retraining a (hopefully) better classifier. Research on active learning in text classification has so far concentrated on single-label classification; active learning for multi-label classification, instead, has either been tackled in a simulated (and, we contend, non-realistic) way, or neglected tout court . In this paper we aim to fill this gap by examining a number of realistic strategies for tackling active learning for multi-label classification. Each such strategy consists of a rule for combining the outputs returned by the individual binary classifiers as a result of classifying a given unlabeled document. We present the results of extensive experiments in which we test these strategies on two standard text classification datasets.